Deep Learning is a subfield of Machine Learning inspired by the human brain. It uses structures called Artificial Neural Networks (ANNs) to learn complex patterns from large amounts of data.
Deep Learning powers many modern technologies such as:
At its heart, deep learning is about learning representations automatically instead of manually crafting features.
A neural network is a computational model made up of interconnected units called neurons, organized in layers. These neurons process input data, transform it, and produce output predictions.
A neural network tries to mimic how biological neurons work, but in a mathematical way.
A basic neural network consists of:
Traditional algorithms require manual feature engineering.
Neural networks learn features automatically from data.
Example:
The perceptron is the simplest form of a neural network and the building block of deep learning.
It represents a single artificial neuron.
A perceptron:
Mathematically:
Output = Activation(Σ (input × weight) + bias)
Imagine deciding whether to approve a loan:
The perceptron combines these to output:
This limitation led to multi-layer neural networks.
Without activation functions:
Activation functions introduce non-linearity, allowing networks to learn complex patterns.
Used in:
Limitation:
Better than sigmoid but still suffers from vanishing gradients.
f(x) = max(0, x)
Advantages:
Used in:
These two processes are the core learning mechanism of neural networks.
Forward propagation is the process of:
Flow:
Input → Hidden Layers → Output
Each neuron:
Backpropagation is the process of learning from mistakes.
Steps:
This is done using calculus (chain rule).
Without backpropagation:
Backpropagation allows the network to:
A loss function measures how wrong the model’s predictions are.
It provides a numerical value representing prediction error.
The goal of training:
Minimize the loss function
Used for:
Measures average squared difference between predicted and actual values.
Less sensitive to outliers than MSE.
Used for:
Measures how close predicted probabilities are to actual labels.
Used for:
Choosing the wrong loss function:
An optimizer decides how the network updates its weights to minimize loss.
It controls:
Backpropagation calculates gradients, but optimizers:
Adam is popular because: