Introduction — Play the Emoji Scavenger Hunt
The challenge: find 8 items in the real world within a time limit. Point your phone camera at objects that match the emoji shown.
Reflect: Did you manage to win? What strategy worked? Was the computer able to identify all items? Did room lighting affect the identification?
🔹 A Quick Overview of Computer Vision
Computer Vision is a system that can process, analyze and make sense of visual data the same way humans do.
5.1 Computer Vision vs Image Processing
👁️ Computer Vision
Deals with extracting information from input images/videos to infer meaningful understanding and predict visual input.
Superset of Image Processing.
Examples: Object detection, Handwriting recognition.
🖼️ Image Processing
Mainly focused on processing raw input images to enhance them or prepare them for other tasks.
Subset of Computer Vision.
Examples: Rescaling image, Correcting brightness, Changing tones.
5.2 Applications of Computer Vision
Computer Vision was first introduced in the 1970s. Today it is widely used across industries:
5.3 Computer Vision Tasks
Every CV application is built on a set of core tasks performed on input images:
Assigning an input image one label from a fixed set of categories. Core CV problem — simple but widely used.
Identifying what object is present AND where it is in the image. Used for single objects.
Finding instances of real-world objects — faces, bicycles, buildings — in images or videos. Uses extracted features + learning algorithms. Common in image retrieval, parking systems.
Detecting objects, giving them a category, and labelling each pixel based on that. Output: a collection of regions/segments.
5.4 Basics of Images — Pixels
5.5 Resolution
🔹 Two Ways to Express Resolution
5.6 Pixel Value
🔹 Why the 0-255 Range?
- Computer data = binary system (0s and 1s).
- Each pixel uses 1 byte = 8 bits.
- Each bit has 2 possible values (0 or 1).
- 8 bits → 2⁸ = 256 possibilities (0 to 255).
- 0 = no colour / black. 255 = full colour / white.
5.7 Grayscale Images
- Darkest shade = Black (pixel value 0).
- Lightest shade = White (pixel value 255).
- Intermediate shades = equal brightness of the three primary colours.
- Each pixel = 1 byte (8 bits). Image = single plane / 2D array of pixels.
- Size of grayscale image = Height × Width (single-plane).
5.8 RGB Images
🔹 How RGB Images Are Stored
Every RGB image is stored in three separate channels:
Each pixel has 3 values (one per channel). All three channels combined form a colour image. Size of RGB image = Height × Width × 3.
- Output colour when R=G=B=255 → White.
- Output colour when R=G=B=0 → Black.
- How does the colour vary when either of the three is 0 and the other two vary? → you get combinations of the other two colours.
- When all three vary in same proportion → shades of gray.
- RGB value of your favourite colour from the palette → experiment!
5.9 No-Code AI Tools for CV
🎨 1. Lobe
- Lobe.ai is an AutoML tool — a no-code AI tool.
- Works with image classification.
- Provide a set of labelled images → Lobe automatically finds the most optimal model to classify them.
🎯 2. Teachable Machine
- AI, ML and DL tool developed by Google in 2017.
- Runs on tensorflow.js.
- Web-based tool to train a model using images, audio, or poses — input via webcam or pictures.
🟠 3. Orange Data Mining
Has dedicated widgets for image classification — covered in Unit 4.
- Form groups of 4 members.
- Find images of Bottles, Cans and Paper online or from around.
- Visit the no-code AI tool.
- Build 3 different classes: Bottles · Cans · Paper.
- Train the model.
- Test the classifier on new images!
🔹 Use-Case Walkthrough — Coral Bleaching Detection
Coral Bleaching happens when corals lose their colour due to stress (rising sea temperatures, pollution, acidification). It has caused unbalanced scenarios in aquatic life. Detecting bleached corals early can save marine ecosystems.
Using Orange Data Mining with image-based widgets, we can build a classification model to detect healthy vs bleached corals — saving marine biodiversity.
5.10 Image Features
🔹 What Makes a Good Feature?
Imagine a security camera capturing an image. Three types of patches we might try to find:
Spread over a lot of area. Can appear anywhere in that region. Hardest to locate exactly.
Edges of a building. Can find approximate location, but the pattern is the same all along the edge — still hard to pin down.
Corners of a building. Wherever you move this patch, it looks different — easiest to find & best features.
5.11 Convolution
An image convolution = element-wise multiplication of image arrays and another array called the kernel, followed by sum.
I * K = Result of applying convolution
The kernel is passed over the whole image (slid across) to get the resulting array after convolution.
5.12 What is a Kernel?
Try:
- Change all kernel values to positive → see what happens.
- Change all to negative → observe.
- Mix negative and positive values → observe.
- Make 4 numbers negative, one positive → notice the pattern.
🔹 Why Use Convolution in CNN?
- To extract features from images — edges, corners, patterns.
- Convolution is used in the Convolutional Neural Network (CNN) for feature extraction (next section).
- The center of the kernel overlaps each pixel; output becomes smaller (because edges can't be fully convolved).
- To keep the same size, we extend edges with zero padding.
5.13 Convolutional Neural Network (CNN)
🔹 CNN Workflow
5.14 Layers of CNN
🔲 1. Convolution Layer
- The first layer of a CNN.
- Objective: extract high-level features such as edges from the input image.
- The first Convolution Layer captures low-level features (edges, colour, gradient orientation).
- Added layers capture high-level features — giving the network a wholesome understanding of images.
- Uses the convolution operation with several kernels to produce several features.
- Output = Feature Map (also called Activation Map).
- Benefits of Feature Map:
- Reduces image size for efficient processing.
- Focuses only on features important for further processing — e.g., eyes, nose, mouth are enough to recognise a person; you don't need the whole face.
📈 2. Rectified Linear Unit (ReLU) Layer
- After the feature map is generated, it is passed to the ReLU layer.
- What it does: simply gets rid of all negative numbers in the feature map; lets positive numbers stay as-is.
- This introduces non-linearity in the feature map.
- Why? It makes colour change more abrupt and obvious — sharper edges.
- Smooth grey gradient → more abrupt edges → better features for later layers → stronger CNN.
If x < 0 → output = 0 · If x ≥ 0 → output = x
🎯 3. Pooling Layer
- Similar to convolution layer — responsible for reducing the spatial size of the convolved feature while retaining important features.
- Two types of pooling:
Returns the maximum value from the portion of the image covered by the kernel.
Returns the average of values from the portion of the image covered by the kernel.
🔹 Why Pooling?
- Makes the image smaller and more manageable.
- Makes the image resistant to small transformations, distortions, translations — a small difference in input creates a very similar pooled image.
🔗 4. Fully Connected (FC) Layer
- The final layer of the CNN.
- Takes results of convolution + pooling and uses them to classify the image into a label.
- Output of conv/pooling is flattened into a single vector of values, each representing a probability that a feature belongs to a certain label.
- Example: for an image of a cat, features like whiskers or fur should have high probability for the label "cat".
5.15 CNN Summary — Putting It All Together
| Layer | Purpose | What Happens |
|---|---|---|
| 1. Convolution | Extract features | Kernels slide over image producing feature maps (edges, colour, gradient). |
| 2. ReLU | Non-linearity | Replaces all negative values with 0; positive values unchanged. |
| 3. Pooling | Reduce size | Max or Average pooling shrinks feature map while keeping important info. |
| 4. Fully Connected | Classify | Flattens features and assigns final label probabilities. |
5.16 Python Libraries for Computer Vision
Three major Python libraries are used to build CV projects:
Open-source ML library by Google. Used for building and training deep-learning models including CNNs for image classification, object detection.
High-level Deep Learning API that runs on top of TensorFlow. Makes it simple to build and train neural networks with just a few lines of Python code.
Open-source Computer Vision library. Works with real-time image and video processing — reading, editing, applying filters, face detection, and more.
🔹 Applications of OpenCV
- Face detection in photos and videos.
- Object tracking in live video streams.
- Motion detection in security footage.
- Image editing — rescaling, rotating, cropping, colour changes.
- OCR (Optical Character Recognition) — read text from images.
- Augmented reality apps.
- Self-driving cars — detect lanes, traffic signs, vehicles.
- Medical imaging — analyse X-rays, MRIs, CT scans.
🔹 Sample Python CV Code (OpenCV)
import cv2 img = cv2.imread('photo.jpg') print("Shape:", img.shape) print("Data type:", img.dtype) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) cv2.imshow('Original', img) cv2.imshow('Grayscale', gray) cv2.waitKey(0) cv2.destroyAllWindows()
5.17 Practical Programs for CV (Class X)
As per the CBSE Class X syllabus (Practical list), you should be able to write these programs:
- Read an image and display using Python (OpenCV imread + imshow).
- Read an image and identify its shape using Python (img.shape gives Height × Width × Channels).
Program: Read and Display an Image
import cv2 img = cv2.imread('photo.jpg') cv2.imshow('My Image', img) cv2.waitKey(0) cv2.destroyAllWindows()
Program: Identify Image Shape
import cv2 img = cv2.imread('photo.jpg') print("Shape of image:", img.shape) print("Height:", img.shape[0]) print("Width:", img.shape[1]) print("Channels:", img.shape[2])
Quick Revision — Key Points to Remember
- Computer Vision (CV) = AI field that enables machines to see, observe and make sense of visual data.
- CV vs Image Processing: CV extracts meaning (superset). Image Processing enhances images (subset).
- Applications: Facial Recognition · Face Filters · Google Image Search · Retail Analytics · Inventory · Self-Driving Cars · Medical Imaging · Google Translate · Agricultural Drones · Attendance.
- 4 CV Tasks: Classification · Classification+Localisation · Object Detection · Instance Segmentation.
- Pixel = smallest unit of a digital image, arranged in a 2D grid.
- Resolution = number of pixels. Expressed as Width×Height or in megapixels.
- Pixel value = 0 to 255 (1 byte / 8 bits). 0 = black, 255 = white.
- Grayscale image = single 2D plane of pixels, each pixel 0-255.
- RGB image = 3 channels (R, G, B) stacked. Each pixel has 3 values. Size = H × W × 3.
- No-Code CV tools: Lobe · Teachable Machine · Orange Data Mining. Activities: Smart Sorter, Coral Bleaching detection.
- Image Features: blobs · edges · corners. Corners are the best features (unique locations).
- Convolution = element-wise multiplication of image + kernel, followed by sum. Slide kernel over image.
- Kernel = matrix that modifies the image — different kernels produce different effects (blur, sharpen, edge-detect).
- CNN (Convolutional Neural Network) = DL algorithm for image classification. 4 layers: Convolution → ReLU → Pooling → Fully Connected.
- Convolution Layer: extracts features using kernels → Feature Map.
- ReLU Layer: removes negatives; introduces non-linearity; ReLU(x) = max(0, x).
- Pooling Layer: Max Pooling (max value) or Average Pooling. Reduces size, keeps important features.
- Fully Connected (FC) Layer: flattens features → classifies image into label probabilities.
- Python Libraries: TensorFlow (Google's DL library) · Keras (high-level API on top of TF) · OpenCV (CV & real-time video/image processing).
- OpenCV applications: face detection, object tracking, motion detection, image editing, OCR, AR, self-driving cars, medical imaging.