VM-LEARNING /class.x ·track.ai ·ch-b5 session: 2026_27
$cd ..

~/Computer Vision

root@vm-learning ~ $ open ch-b5
PART B ▪ UNIT 5
10
Computer Vision
Pixels · RGB · Image Features · Convolution · CNN
Computer Vision (CV) is a field of Artificial Intelligence that enables machines to see, observe and make sense of visual data — images and videos — in a way similar to how humans do. It involves extraction of information from images, text and videos to infer meaningful understanding and make predictions.
Just as AI enables computers to think, Computer Vision enables AI to see. It is a subset of AI that includes techniques from Deep Learning and Machine Learning.

Introduction — Play the Emoji Scavenger Hunt

Try this game: https://emojiscavengerhunt.withgoogle.com/

The challenge: find 8 items in the real world within a time limit. Point your phone camera at objects that match the emoji shown.

Reflect: Did you manage to win? What strategy worked? Was the computer able to identify all items? Did room lighting affect the identification?
🔹 A Quick Overview of Computer Vision

Computer Vision is a system that can process, analyze and make sense of visual data the same way humans do.

👁️ HUMAN VISION vs COMPUTER VISION
Object Eye / Sensing Device (Camera) Brain / Interpreting Device (AI)

5.1 Computer Vision vs Image Processing

👁️ Computer Vision

Deals with extracting information from input images/videos to infer meaningful understanding and predict visual input.

Superset of Image Processing.

Examples: Object detection, Handwriting recognition.

🖼️ Image Processing

Mainly focused on processing raw input images to enhance them or prepare them for other tasks.

Subset of Computer Vision.

Examples: Rescaling image, Correcting brightness, Changing tones.

Learning Outcome 1: Define CV and understand its applications

5.2 Applications of Computer Vision

Computer Vision was first introduced in the 1970s. Today it is widely used across industries:

🧑 Facial RecognitionUsed in smart cities, smart homes — for security, guest recognition, and school attendance systems.
😀 Face FiltersInstagram, Snapchat — identify facial dynamics and apply selected filters through the camera.
🔍 Google Search by ImageUpload an image and Google compares its features to a massive database to return matching results.
🛍️ Computer Vision in RetailTrack customers' movements, analyse navigation paths, detect walking patterns for better store layout.
📦 Inventory ManagementSecurity cameras + CV algorithms estimate stock accurately and suggest better shelf placement.
🚗 Self-Driving CarsIdentifies objects, navigational routes, monitors environment for autonomous vehicles.
🏥 Medical ImagingHelps doctors interpret X-rays, MRIs, converts 2D scans into interactive 3D models.
🌐 Google Translate AppPoint phone camera at foreign-language signs — OCR + augmented reality overlays instant translation.
🌾 Agricultural DronesDrones with high-res cameras monitor crop health, detect pests and diseases, estimate yields.
🎓 Attendance SystemsAutomated face-recognition based attendance in schools and offices.
Learning Outcome 2: Understand Computer Vision Tasks

5.3 Computer Vision Tasks

Every CV application is built on a set of core tasks performed on input images:

🏷️ 1. Classification

Assigning an input image one label from a fixed set of categories. Core CV problem — simple but widely used.

📍 2. Classification + Localisation

Identifying what object is present AND where it is in the image. Used for single objects.

🎯 3. Object Detection

Finding instances of real-world objects — faces, bicycles, buildings — in images or videos. Uses extracted features + learning algorithms. Common in image retrieval, parking systems.

🎨 4. Instance Segmentation

Detecting objects, giving them a category, and labelling each pixel based on that. Output: a collection of regions/segments.

5.4 Basics of Images — Pixels

The word pixel means "picture element". Every digital photograph is made up of pixels — the smallest unit of information that makes a picture. They are usually round or square and arranged in a 2-dimensional grid.
The more pixels you have, the more closely the image resembles the original. Fewer pixels = pixelated, blocky image.

5.5 Resolution

Resolution = the number of pixels in an image.
🔹 Two Ways to Express Resolution
📐 Width × Heighte.g., 1280 × 1024 — 1280 pixels from left to right, 1024 from top to bottom.
📸 MegapixelsSingle number in millions of pixels. e.g., 5 megapixel camera = 5 million pixels (width × height = 5,000,000). A 1280 × 1024 monitor = 1,310,720 = 1.31 MP.

5.6 Pixel Value

Each pixel has a pixel value that describes its brightness and colour. The most common pixel format is the byte image — the value is stored as an 8-bit integer with range 0 to 255.
🔹 Why the 0-255 Range?

5.7 Grayscale Images

Grayscale images are images that have a range of shades of gray without apparent colour.

5.8 RGB Images

All coloured images we see are RGB images. They are made up of three primary colours — Red (R), Green (G), Blue (B). All visible colours come from different intensities of these three.
🔹 How RGB Images Are Stored

Every RGB image is stored in three separate channels:

R ChannelIntensity of red for each pixel (0-255).
G ChannelIntensity of green for each pixel (0-255).
B ChannelIntensity of blue for each pixel (0-255).

Each pixel has 3 values (one per channel). All three channels combined form a colour image. Size of RGB image = Height × Width × 3.

Activity — RGB Calculator: Visit https://www.w3schools.com/colors/colors_rgb.asp, then answer:
  1. Output colour when R=G=B=255 → White.
  2. Output colour when R=G=B=0 → Black.
  3. How does the colour vary when either of the three is 0 and the other two vary? → you get combinations of the other two colours.
  4. When all three vary in same proportion → shades of gray.
  5. RGB value of your favourite colour from the palette → experiment!
Pixel Art Task: Visit www.piskelapp.com and create your own pixel art. Try making a GIF!
Learning Outcome 3: Use No-Code AI tools for Computer Vision

5.9 No-Code AI Tools for CV

🎨 1. Lobe

🎯 2. Teachable Machine

🟠 3. Orange Data Mining

Has dedicated widgets for image classification — covered in Unit 4.

Activity — Build a Smart Sorter (Teachable Machine or Lobe):
  1. Form groups of 4 members.
  2. Find images of Bottles, Cans and Paper online or from around.
  3. Visit the no-code AI tool.
  4. Build 3 different classes: Bottles · Cans · Paper.
  5. Train the model.
  6. Test the classifier on new images!
🔹 Use-Case Walkthrough — Coral Bleaching Detection
What are Coral Reefs? Coral reefs are large underwater structures made of skeletons of marine invertebrates. Found in tropical ocean waters — integral to aquatic life.

Coral Bleaching happens when corals lose their colour due to stress (rising sea temperatures, pollution, acidification). It has caused unbalanced scenarios in aquatic life. Detecting bleached corals early can save marine ecosystems.

Using Orange Data Mining with image-based widgets, we can build a classification model to detect healthy vs bleached corals — saving marine biodiversity.
Learning Outcome 4: Understand Image Features & Convolution

5.10 Image Features

In computer vision, a feature is a piece of information that is relevant for solving a computational task. Features may be specific structures — points, edges, or objects.
🔹 What Makes a Good Feature?

Imagine a security camera capturing an image. Three types of patches we might try to find:

🟦 Patch A & B — Flat Areas

Spread over a lot of area. Can appear anywhere in that region. Hardest to locate exactly.

▬ Patch C & D — Edges

Edges of a building. Can find approximate location, but the pattern is the same all along the edge — still hard to pin down.

▪ Patch E & F — Corners

Corners of a building. Wherever you move this patch, it looks differenteasiest to find & best features.

Conclusion: In image processing we can extract blobs, edges, or corners as features. But corners are the best features because they are unique — they can only be found at a particular location. Edges are second-best; flat areas are worst.

5.11 Convolution

Convolution is a simple mathematical operation fundamental to many image-processing operators. It multiplies together two arrays of numbers (of the same dimensionality but different sizes) to produce a third array.

An image convolution = element-wise multiplication of image arrays and another array called the kernel, followed by sum.

I = Image Array   ·   K = Kernel Array
I * K = Result of applying convolution

The kernel is passed over the whole image (slid across) to get the resulting array after convolution.

When we edit photos in Photoshop, or apply filters on Instagram / Snapchat, we are using convolution internally! The filter is simply a kernel that modifies pixel values to produce an effect.

5.12 What is a Kernel?

A Kernel is a matrix that is slid across the image and multiplied with the input, such that the output is enhanced in a certain desirable manner. Each kernel has different values for different effects — sharpen, blur, edge-detect, emboss.
Activity — Online Kernel Tool: https://setosa.io/ev/image-kernels/
Try:
  1. Change all kernel values to positive → see what happens.
  2. Change all to negative → observe.
  3. Mix negative and positive values → observe.
  4. Make 4 numbers negative, one positive → notice the pattern.
Propose your own theory for how convolution works, then test it on different images!
🔹 Why Use Convolution in CNN?
Learning Outcome 5: Understand CNN architecture and layers

5.13 Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other.
🔹 CNN Workflow
🔄 HOW A CNN WORKS
1. Input Image 2. Convolution Layer 3. ReLU Layer 4. Pooling Layer 5. Fully Connected Layer 6. Output (Prediction)

5.14 Layers of CNN

🔲 1. Convolution Layer

📈 2. Rectified Linear Unit (ReLU) Layer

ReLU(x) = max(0, x)
If x < 0 → output = 0   ·   If x ≥ 0 → output = x

🎯 3. Pooling Layer

🔼 Max Pooling

Returns the maximum value from the portion of the image covered by the kernel.

📊 Average Pooling

Returns the average of values from the portion of the image covered by the kernel.

🔹 Why Pooling?

🔗 4. Fully Connected (FC) Layer

5.15 CNN Summary — Putting It All Together

LayerPurposeWhat Happens
1. ConvolutionExtract featuresKernels slide over image producing feature maps (edges, colour, gradient).
2. ReLUNon-linearityReplaces all negative values with 0; positive values unchanged.
3. PoolingReduce sizeMax or Average pooling shrinks feature map while keeping important info.
4. Fully ConnectedClassifyFlattens features and assigns final label probabilities.
CNN Conv and Pool layers can be stacked multiple times. Early layers learn simple features (edges), deeper layers learn complex features (shapes, objects).

5.16 Python Libraries for Computer Vision

Three major Python libraries are used to build CV projects:

🟠 TensorFlow

Open-source ML library by Google. Used for building and training deep-learning models including CNNs for image classification, object detection.

🔵 Keras

High-level Deep Learning API that runs on top of TensorFlow. Makes it simple to build and train neural networks with just a few lines of Python code.

🟢 OpenCV

Open-source Computer Vision library. Works with real-time image and video processing — reading, editing, applying filters, face detection, and more.

🔹 Applications of OpenCV
🔹 Sample Python CV Code (OpenCV)
import cv2
img = cv2.imread('photo.jpg')
print("Shape:", img.shape)
print("Data type:", img.dtype)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow('Original', img)
cv2.imshow('Grayscale', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

5.17 Practical Programs for CV (Class X)

As per the CBSE Class X syllabus (Practical list), you should be able to write these programs:

Program: Read and Display an Image
import cv2
img = cv2.imread('photo.jpg')
cv2.imshow('My Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Program: Identify Image Shape
import cv2
img = cv2.imread('photo.jpg')
print("Shape of image:", img.shape)
print("Height:", img.shape[0])
print("Width:", img.shape[1])
print("Channels:", img.shape[2])
Shape of image: (480, 640, 3) Height: 480 Width: 640 Channels: 3

Quick Revision — Key Points to Remember

  • Computer Vision (CV) = AI field that enables machines to see, observe and make sense of visual data.
  • CV vs Image Processing: CV extracts meaning (superset). Image Processing enhances images (subset).
  • Applications: Facial Recognition · Face Filters · Google Image Search · Retail Analytics · Inventory · Self-Driving Cars · Medical Imaging · Google Translate · Agricultural Drones · Attendance.
  • 4 CV Tasks: Classification · Classification+Localisation · Object Detection · Instance Segmentation.
  • Pixel = smallest unit of a digital image, arranged in a 2D grid.
  • Resolution = number of pixels. Expressed as Width×Height or in megapixels.
  • Pixel value = 0 to 255 (1 byte / 8 bits). 0 = black, 255 = white.
  • Grayscale image = single 2D plane of pixels, each pixel 0-255.
  • RGB image = 3 channels (R, G, B) stacked. Each pixel has 3 values. Size = H × W × 3.
  • No-Code CV tools: Lobe · Teachable Machine · Orange Data Mining. Activities: Smart Sorter, Coral Bleaching detection.
  • Image Features: blobs · edges · corners. Corners are the best features (unique locations).
  • Convolution = element-wise multiplication of image + kernel, followed by sum. Slide kernel over image.
  • Kernel = matrix that modifies the image — different kernels produce different effects (blur, sharpen, edge-detect).
  • CNN (Convolutional Neural Network) = DL algorithm for image classification. 4 layers: Convolution → ReLU → Pooling → Fully Connected.
  • Convolution Layer: extracts features using kernels → Feature Map.
  • ReLU Layer: removes negatives; introduces non-linearity; ReLU(x) = max(0, x).
  • Pooling Layer: Max Pooling (max value) or Average Pooling. Reduces size, keeps important features.
  • Fully Connected (FC) Layer: flattens features → classifies image into label probabilities.
  • Python Libraries: TensorFlow (Google's DL library) · Keras (high-level API on top of TF) · OpenCV (CV & real-time video/image processing).
  • OpenCV applications: face detection, object tracking, motion detection, image editing, OCR, AR, self-driving cars, medical imaging.
🧠Practice Quiz — test yourself on this chapter