Computer Vision is a field of Artificial Intelligence that enables machines to see, interpret, and understand visual information from the real world, such as images and videos.
Humans process visual data naturally, but for machines, images are just numbers. Computer Vision bridges this gap by converting visual data into numerical representations and then applying algorithms to extract meaning.
Computer Vision is used in:
To a computer, an image is not a picture, but a matrix of numbers.
Each number represents the intensity of light at a pixel.
A pixel (picture element) is the smallest unit of an image.
Example:
A 28×28 image (like MNIST digits) is a matrix of size 28×28.
Color images are represented using 3 channels:
This forms a 3D matrix:
Height × Width × 3
Each pixel contains three values, representing color intensity.
Understanding image representation is essential for:
OpenCV (Open Source Computer Vision Library) is a powerful open-source library used for real-time computer vision tasks.
It provides tools for:
In face detection:
Traditional machine learning algorithms fail on images because:
Convolutional Neural Networks (CNNs) solve this by automatically learning visual features.
Each filter slides over the image and performs convolution.
Usually ReLU is applied after convolution to introduce non-linearity.
Common pooling: Max Pooling
Image classification assigns a label to an entire image.
Example:
CNNs handle these challenges effectively.
Object detection not only identifies what objects are present, but also where they are located in an image.
Output includes:
Modern CNN-based detectors handle these challenges efficiently.
Transfer learning uses a pre-trained model and adapts it to a new task.
Instead of training from scratch, we reuse learned knowledge.