Transpose Matrices¶
# matrix m transpose
m = np.array([[1, 2, 3], [4, 5, 6]])
m_t = np.transpose(m) # [[1, 4], [2, 5], [3, 6]]
Neural Networks Concept Viewed as a Graph¶
- Neurons are like nodes in a graph.
- Weights are like edges connecting the nodes.
- Activation functions are like the nodes' output functions.
- Loss functions are like the cost functions.
- Optimizers are like the algorithms to minimize the cost functions.
Useful Analogy: Comparing TensorFlow Neural Networks to Execution Graphs¶
Aspect |
Neural Network |
Execution Graph |
Structure |
Layers (input, hidden, output) connected by weights. |
Visual representation of computations. |
Nodes |
Neurons that perform computations. |
Operations (e.g., addition, multiplication). |
Edges |
Weights that connect neurons. |
Data paths that carry datastructures between operations. |
Data Flow |
Data (tensors) flows from the input layer, through hidden layers, to the output layer. |
Data flows from one operation to another, following the graph's structure. |
Training |
Adjusts the weights to minimize error. |
Updates the graph's parameters to optimize performance. |
Example |
Input Layer -> Hidden Layer -> Output Layer |
Operation A -> Operation B -> Operation C |
Visual Representation |
Input -> [Weights] -> Hidden -> [Weights] -> Output |
Node A -> [Data] -> Node B -> [Data] -> Node C |
Why Activation Functions applied to the tensors?¶
Activation functions are applied to tensors at each layer of a neural network. They bring non-linearity to the model. The most common activation functions used in DL:
- Sigmoid:
tf.keras.activations.sigmoid
(0-1) used in the output layer of a binary classification problem.
- Tanh:
tf.keras.activations.tanh
(-1-1)
- ReLU:
tf.keras.activations.relu
(0 to infinity) is used in hidden layers of a neural network.
- Leaky ReLU:
tf.keras.activations.relu
(0 to infinity)
- Softmax:
tf.keras.activations.softmax
(0-1) used in the output layer of a multi-class classification problem.
As well as implementation in python (numpy):
```python
def sigmoid(x): return 1 / (1 + np.exp(-x))
def relu(x): return np.maximum(0, x)
def softmax(x): exp_x = np.exp(x); return exp_x / exp_x.sum()
### Loss Functions
Loss functions are used to measure the error/performance between the predicted value and actual value. The loss function is used to update the weights of the model to minimize the error. The loss function is used to update the weights of the model to minimize the error. Here are the most common loss functions used in DL:
1. Mean Squared Error: `tf.keras.losses.mean_squared_error`
2. Binary Crossentropy: `tf.keras.losses.binary_crossentropy`
3. Categorical Crossentropy: `tf.keras.losses.categorical_crossentropy`
4. Sparse Categorical Crossentropy: `tf.keras.losses.sparse_categorical_crossentropy`
5. Hinge: `tf.keras.losses.hinge`
Implementing MSE in python (numpy):
```python
def mean_squared_error(y_true, y_pred):
return np.mean(np.square(y_true - y_pred))
Optimizers¶
Optimizers are used to update the weights of the model to minimize the error (minimize the loss function).
Here are the most common optimizers used in DL:
- SGD:
tf.keras.optimizers.SGD
- RMSprop:
tf.keras.optimizers.RMSprop
- Adagrad:
tf.keras.optimizers.Adagrad
- Adadelta:
tf.keras.optimizers.Adadelta
- Adam:
tf.keras.optimizers.Adam
def gradient_descent(weights, learning_rate, gradients):
return weights - learning_rate * gradients
def adam(weights, learning_rate, gradients, m, v, beta1=0.9, beta2=0.999, epsilon=1e-8):
m = beta1 * m + (1 - beta1) * gradients
v = beta2 * v + (1 - beta2) * gradients**2
m_hat = m / (1 - beta1)
v_hat = v / (1 - beta2)
return weights - learning_rate * m_hat / (np.sqrt(v_hat) + epsilon)
Regularization¶
Regularization reduces model complexity and prevents overfitting by adding a penalty to the loss function.
Types of Regularization:
- L1 Regularization: Adds a penalty equal to the absolute value of coefficients.
tf.keras.regularizers.L1(l1=0.01)
- L2 Regularization: Adds a penalty equal to the square of coefficients.
tf.keras.regularizers.L2(l2=0.01)
- L1 and L2 Regularization: Combines both L1 and L2 penalties.
tf.keras.regularizers.L1L2(l1=0.01, l2=0.01)
Usage in Layers:
- Apply regularization to layers like Dense, Conv2D.
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu',
kernel_regularizer=tf.keras.regularizers.L2(0.01)),
tf.keras.layers.Dense(10, activation='softmax')
])
Impact on Training:
- Regularization terms are added to the loss function to control model complexity.
Hyperparameter Tuning:
- Adjust
l1
, l2
to balance underfitting and overfitting.
Dropout:
- Randomly ignore neurons during training.
tf.keras.layers.Dropout(rate=0.5)
Backpropagation¶
Backpropagation updates neural network weights by calculating the gradient of the loss function.
def backpropagation(weights, learning_rate, gradients):
return weights - learning_rate * gradients
Mathematics¶
Traversing Neural Network with Mathematical Operations¶
A neural network can be represented mathematically using matrices and vectors.
$$\
\mathbf{b}_2 = \begin{bmatrix}
b_{21} \\
b_{22} \\
\vdots \\
b_{210}
\end{bmatrix}
\
$$
Neural Network Tree Representation¶
forward propagation in a neural network can be represented as a tree structure with three layers: input, hidden, and output.
Here is the graph over a simple neural network with one hidden layer mathematical operations for each node and edge are included.
plaintext
A1 A2 A3 Input layer
\\| |//
\|X|/
B1 B2 Hidden Layer
\ /
C1 Output Layer
Pseudocode¶
Define the input layer.
Define the hidden layer with a specified number of neurons and an activation function.
Define the output layer with the number of neurons corresponding to the number of classes and an activation function.
Compile the model with a loss function and an optimizer.
Train the model on the dataset.
Mathematical Representation using Matrices¶
Iput Layer:
- Neurons: Nodes that receive input data.
$$ \mathbf{x} \in \mathbb{R}^{784} $$
Hidden Layer:
- Weights: $ \mathbf{W}_1 \in \mathbb{R}^{784 \times 128} $
$$
\mathbf{W}_1 = \begin{bmatrix}
w_{11} & w_{12} & \cdots & w_{1,128} \\
w_{21} & w_{22} & \cdots & w_{2,128} \\
\vdots & \vdots & \ddots & \vdots \\
w_{784,1} & w_{784,2} & \cdots & w_{784,128}
\end{bmatrix}
$$
- Biases:
$$
\mathbf{b}_1 = \begin{bmatrix}
b_{11} \\
b_{12} \\
\vdots \\
b_{128}
\end{bmatrix}
$$
- Activation:
$$
\mathbf{h} = \text{ReLU}(\mathbf{W}_1 \mathbf{x} + \mathbf{b}_1)
$$
Output Layer:
$$ \mathbf{y} = \text{softmax}(\mathbf{W}_2 \mathbf{h} + \mathbf{b}_2) $$
Explanation Beyond the Maths¶
- Input Layer: Takes an input vector $ \ \mathbf{x} \ $ of size 784.
- Hidden Layer: Applies a linear transformation followed by a ReLU activation function.
- Output Layer: Applies another linear transformation followed by a softmax activation function to produce a probability distribution over the 10 classes.
Neural Network Layers¶
Layer |
Pseudocode |
Mathematical Representation |
Explanation Beyond the Maths |
Input Layer |
x = np.random.randn(input_size) |
$$ \mathbf{x} \in \mathbb{R}^{784} $$ |
Takes an input vector ( \mathbf{x} ) of size 784, representing a 28x28 pixel image. |
Hidden Layer |
h = relu(np.dot(x, weight1) + bias1) |
$$ \mathbf{h} = \text{ReLU}(\mathbf{W}_1 \mathbf{x} + \mathbf{b}_1) $$ |
Applies a linear transformation followed by a ReLU activation function. |
Output Layer |
y = softmax(np.dot(h, weight2) + bias2) |
$$ \mathbf{y} = \text{softmax}(\mathbf{W}_2 \mathbf{h} + \mathbf{b}_2) $$ |
Applies another linear transformation followed by a softmax activation function. |
Mathematical Representation Details¶
Component |
Pseudocode |
Mathematical Representation |
Explanation Beyond the Maths |
Weights (Hidden Layer) |
weight1 = np.random.randn(input_size, hidden_layer_size) |
$$ \mathbf{W}_1 \in \mathbb{R}^{784 \times 128} $$ |
Matrix of weights connecting the input layer to the hidden layer. |
Biases (Hidden Layer) |
bias1 = np.random.randn(hidden_layer_size) |
$$ \mathbf{b}_1 \in \mathbb{R}^{128} $$ |
Bias vector added to the hidden layer. |
Weights (Output Layer) |
weight2 = np.random.randn(hidden_layer_size, output_size) |
$$ \mathbf{W}_2 \in \mathbb{R}^{128 \times 10} $$ |
Matrix of weights connecting the hidden layer to the output layer. |
Biases (Output Layer) |
bias2 = np.random.randn(output_size) |
$$ \mathbf{b}_2 \in \mathbb{R}^{10} $$ |
Bias vector added to the output layer. |
Example¶
For a digit recognition problem, the input layer receives a 784-dimensional vector representing a 28x28 pixel image. The hidden layer applies a linear transformation followed by a ReLU activation function. The output layer applies another linear transformation followed by a softmax activation function to produce a probability distribution over the 10 classes (digits 0-9).
Training a Neural Network¶
Training involves feeding input data through the network.
import numpy as np
# Define the input layer
input_size = 784
hidden_layer_size = 128
output_size = 10
# Initialize weights and biases
weight1 = np.random.randn(input_size, hidden_layer_size) # Weights for input to hidden layer
bias1 = np.random.randn(hidden_layer_size) # Biases for hidden layer
weight2 = np.random.randn(hidden_layer_size, output_size) # Weights for hidden to output layer
bias2 = np.random.randn(output_size) # Biases for output layer
# Forward pass
def relu(x):
return np.maximum(0, x)
def softmax(x):
exp_x = np.exp(x - np.max(x))
return exp_x / exp_x.sum(axis=0)
# Input vector
x = np.random.randn(input_size)
# Hidden layer computation
h = relu(np.dot(x, weight1) + bias1)
# Output layer computation
y = softmax(np.dot(h, weight2) + bias2)
Conclusion¶
Deep Learning is a subset of Machine Learning that uses neural networks to model complex patterns in data. It involves activation functions, loss functions, optimizers, regularization, backpropagation, and training. Mostly built using linear algebra translated to Python.