Deep Learning with TensorFlow / PyTorch

Building deep learning models involves using specialized frameworks that handle the heavy lifting of mathematical gradients and hardware optimization. The two industry giants are TensorFlow (Google) and PyTorch (Meta).

1. Tensor Basics

At the heart of both frameworks is the Tensor. Think of a tensor as a multi-dimensional array, similar to NumPy’s ndarray, but with a superpower: it can live on a GPU for massive speedups.

Scalar: A single number (0D Tensor).
Vector: A list of numbers (1D Tensor).
Matrix: A grid of numbers (2D Tensor).
Tensor: $n$-dimensional (e.g., a color image is a 3D tensor: height, width, and 3 color channels).

Key Feature: Tensors track their own gradients. When you perform an operation (like addition or multiplication), the framework builds a “Computational Graph” that allows it to calculate derivatives automatically (Autograd).

2. Building Neural Networks

There are two main ways to build models:

Sequential API (Simple)

Ideal for simple stacks of layers where each layer has exactly one input and one output.

Example: Input -> Dense -> ReLU -> Dense -> Softmax.

Functional / Subclassing API (Complex)

Used for models with multiple inputs, multiple outputs, or shared layers (like Residual Networks).

Example: You define a class and write a forward() (PyTorch) or call() (TensorFlow) method to specify exactly how data flows.

3. Training Deep Models

The training loop is the “heartbeat” of your ML project. It consistently follows these steps:

Forward Pass: Pass a batch of data through the model to get predictions.
Loss Calculation: Compare predictions to actual labels using a loss function (e.g., CrossEntropy).
Backward Pass (Backprop): Calculate the gradient of the loss with respect to every weight.
Optimizer Step: Use the gradients to update the weights (e.g., using the Adam optimizer).
Zero Gradients: Clear the gradients for the next batch so they don’t accumulate.

4. Model Optimization

Deep models are prone to overfitting and slow convergence. We use several techniques to fix this:

Dropout: Randomly “turns off” a percentage of neurons during training. This forces the network to not rely on any single neuron, making it more robust.
Batch Normalization: Normalizes the inputs of each layer. This stabilizes the learning process and allows for higher learning rates.
Weight Initialization: Starting weights at specific random values (like He or Xavier init) to prevent gradients from disappearing.

5. Callbacks & Checkpoints

Training can take hours or days. You don’t want to sit and watch the screen, nor do you want to lose progress if the power goes out.

Model Checkpoints: Automatically save the “best” version of your model weights based on validation accuracy. If the model starts overfitting, you still have the best version saved.
Early Stopping: Automatically stops training when the validation loss stops improving. This saves time and prevents overfitting.
Learning Rate Schedulers: Slowly reduce the learning rate as training progresses (like slowing down a car as you approach a stop sign) to help the model settle into the global minimum.

6. GPU Acceleration Basics

Deep learning is essentially billions of matrix multiplications. A CPU handles tasks sequentially, while a GPU (Graphics Processing Unit) can handle thousands of small tasks simultaneously.

CUDA: A platform created by NVIDIA that allows software to use the GPU for general-purpose processing.
Memory Management: You must explicitly move both your Model and your Data to the GPU memory (.to('cuda') in PyTorch).
Bottlenecks: Often, the CPU is too slow at loading data from the disk, leaving the fast GPU “starving” for data. We use DataLoaders with multiple workers to solve this.

Comparison: TensorFlow vs. PyTorch

Feature	TensorFlow	PyTorch
Philosophy	“Production-first,” static graphs (traditionally).	“Research-first,” dynamic graphs.
Ease of Use	High (via Keras API).	High (very Pythonic/intuitive).
Deployment	Excellent (TF Serving, TF Lite).	Strong (TorchScript, ONNX).
Community	Massive industry backing.	Massive academic/research backing.

Log In

Sign Up