VM-LEARNING /class.xii ·track.ai ·ch-b3 session: 2026_27
$cd ..

~/Making Machines See

root@vm-learning ~ $ open ch-b3
PART B ▪ UNIT 3
03
Making Machines See
Computer Vision · Pixels · Process Stages · Applications · OpenCV
Computer Vision (CV) — a field of AI that enables systems to see, observe and understand the visual world. It uses sensing devices and deep-learning models to derive meaningful information from digital images, videos and other visual inputs, and to make recommendations or take actions accordingly. Also called Machine Vision.

Introduction — Why Machines Need to See

With social media (Facebook, Instagram, Twitter) and smartphone cameras, billions of images and videos are shared every day. Unlike text, indexing and searching images is hard — algorithms must organise image data by colour, texture, shape or metadata for quick retrieval. Traditionally, images relied on manual meta-descriptions; today we need computers to visually perceive images themselves.

Just as children are taught to associate an image (e.g., an apple) with the letter "A", computers must develop similar capabilities by repeatedly viewing labelled images.

🔹 Key Concepts You'll Learn
  1. Introduction to Computer Vision
  2. Working of Computer Vision
  3. Applications of Computer Vision
  4. Challenges of Computer Vision
  5. The Future of Computer Vision

Prerequisite: Basic understanding of digital imaging and knowledge of machine learning.

Learning Outcome 1: Explain the concept of computer vision and its significance in analysing visual data

1.1 How Machines See?

Computer Vision is analogous to human vision:

Human VisionComputer Vision
RetinaCamera sensor
Optic nerveData bus
Visual cortexDeep-learning model / algorithm

CV systems inspect products, infrastructure or production assets in real time — noticing defects or issues faster than humans. Due to speed, objectivity, continuity, accuracy and scalability, they often surpass human capabilities. Modern deep-learning models achieve above-human accuracy on tasks like facial recognition, object detection and image classification.

Learning Outcome 2: Demonstrate an understanding of the key stages involved in the computer vision process and their roles in interpreting images and videos

2.1 Working of Computer Vision — Basics of Digital Images

Digital Image — a picture stored on a computer as a sequence of numbers. Created by design software (Paint, Photoshop), digital cameras, or scanners.

2.2 Interpretation of an Image in Digital Form

Pixel = "Picture Element" — the smallest square in a digital image, each representing a specific colour value.
🔹 Binary-to-Decimal Quick Reference

Binary 00000000 = 2⁷·0 + 2⁶·0 + … + 2⁰·0 = 0 (black).
Binary 11111111 = 2⁷·1 + 2⁶·1 + … + 2⁰·1 = 255 (white).

Activity 3.1 — Binary Art: Recreating Images with 0s and 1s. (1) Choose any image from Pixabay / Unsplash / Pexels. (2) Resize to 200–300 px using imageresizer.com. (3) Convert to grayscale with pinetools.com. (4) Extract pixel values with Boxentriq Pixel Value Extractor. (5) Copy all values → paste into a Word / Google Doc. (6) Select all and set font size to 1. (7) Observe the original image re-emerge from 0s and 1s!

2.3 Computer Vision Process — The 5 Stages

📷 Stage 1 · Image Acquisition

The initial stage — capturing digital images or videos. This is the raw data for subsequent analysis.

🧼 Stage 2 · Preprocessing

Aims to enhance the quality of the acquired image. Main goals: remove noise · highlight important features · ensure consistency across the dataset.

  1. Noise Reduction — removes blurriness, random spots, distortions. Example: removing grainy effects in low-light photos.
  2. Image Normalization — standardises pixel values to a consistent range (e.g., 0–1 or −1 to +1). Example: scaling 0–255 → 0–1.
  3. Resizing / Cropping — makes all images uniform. Example: resize all inputs to 224 × 224 pixels before a neural network.
  4. Histogram Equalization — adjusts brightness & contrast by spreading pixel intensities evenly, enhancing detail in dark or bright regions.

🔍 Stage 3 · Feature Extraction

Identifies and extracts relevant visual patterns or attributes. Common techniques:

In deep-learning approaches, feature extraction is performed automatically by Convolutional Neural Networks (CNNs) during training.

🎯 Stage 4 · Detection / Segmentation

Identifies objects or regions of interest. Split into two broad categories:

🔹 Single-Object Tasks
🔹 Multiple-Object Tasks
🔹 Key Difference: Classification vs Detection

Classification considers the whole image and predicts one class. Detection identifies multiple objects in an image and classifies each.

🧠 Stage 5 · High-Level Processing

The final stage — interpreting and extracting meaningful information. Tasks include recognising objects, understanding scenes and analysing context. Empowers CV systems to extract valuable insights and drive intelligent decision-making in domains from autonomous driving to medical diagnostics.

Learning Outcome 3: Identify real-world applications of computer vision technology in various industries and understand how it enhances efficiency and productivity

3.1 Applications of Computer Vision

Learning Outcome 4: Evaluate the ethical implications and challenges associated with computer vision, including privacy concerns and the spread of misinformation

4.1 Challenges of Computer Vision

🧩 1. Reasoning and Analytical Issues

CV relies on more than just image identification — it requires accurate interpretation. Without robust reasoning, extracting meaningful insights is limited.

📸 2. Difficulty in Image Acquisition

Hindered by lighting variations, perspectives, scales, occlusions and complex multi-object scenes. Obtaining high-quality data amid these challenges is crucial.

🔐 3. Privacy & Security Concerns

Vision-powered surveillance can infringe on privacy rights. Facial recognition raises ethical dilemmas — regulatory scrutiny and public debate surround such technologies.

🎭 4. Duplicate & False Content

Malicious actors can create misleading or fraudulent content (deepfakes, forged images). Data breaches foster misinformation and reputational damage.

Learning Outcome 5: Envision the future possibilities of computer vision technology

5.1 The Future of Computer Vision

CV has evolved from basic image processing to systems that understand and interpret visual data with human-like precision. Breakthroughs in deep learning and the availability of vast labelled training datasets have propelled the field forward.

🔹 What's Next?

By embracing innovation, fostering collaboration and prioritising ethics, we can harness the transformative power of CV for humanity.

Activity 3.2 — Creating a Website Containing an ML Model (Teachable Machine + Weebly). (1) Visit teachablemachine.withgoogle.com → Get Started. (2) Choose Image project → Standard Image Model. (3) Add Class 1 = "Kittens" with uploaded photos, Class 2 = "Puppies". (4) Click Train Model. (5) Test with webcam or uploaded image. (6) Click Export Model → Upload my model, then Copy the JavaScript snippet. (7) Paste into Notepad and save as web.html. (8) Create a free account on Weebly, add an Embed Code block and paste the JavaScript. (9) Click Publish on the Weebly subdomain. (10) Open the URL and test your ML-powered website!
Learning Outcome 6 (for Advanced Learners): Develop basic skills in using OpenCV and deploying machine learning models online

6.1 Introduction to OpenCV

OpenCV (Open-Source Computer Vision Library) — a cross-platform library for developing real-time computer vision applications. Focuses on image processing, video capture and analysis, with features like face detection, object detection, and handwriting recognition.
🔹 Installing OpenCV
pip install opencv-python

6.2 Loading & Displaying an Image

import cv2

image = cv2.imread('example.jpg')   # load image
cv2.imshow('original image', image) # display image
cv2.waitKey(0)                      # wait for any key press
cv2.destroyAllWindows()             # close all OpenCV windows
🔹 Function Reference

6.3 Resizing an Image

import cv2

image = cv2.imread('example.jpg')
new_width  = 300
new_height = 300
resized_image = cv2.resize(image, (new_width, new_height))

cv2.imshow('Resized Image', resized_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Use cv2.resize(image, (width, height)) to set fixed dimensions. Common use: standardise all inputs (e.g., 300 × 300 or 224 × 224) before feeding them into a model.

6.4 Converting an Image to Grayscale

Grayscale images reduce computational complexity by removing the three colour channels.

import cv2

image = cv2.imread('example.jpg')
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

cv2.imshow('Grayscale Image', grayscale_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Use cv2.cvtColor(src, cv2.COLOR_BGR2GRAY) — the colour-conversion code cv2.COLOR_BGR2GRAY converts BGR → grayscale.

Check Your Progress — quick MCQ pointers:
  • Field that helps computers "see" → Computer Vision.
  • Task of assigning a class label to an input image → Image Classification.
  • 1 byte = 8 bits. Monochrome pixel range: 0–255.
  • Capturing a digital image → Image Acquisition.
  • KNN is used for supervised classification; K-means for unsupervised.
  • A computer sees an image as a series of pixels.
  • High-level processing drives intelligent decision-making.
  • Edge detection identifies abrupt intensity changes / object boundaries.
  • Preprocessing does not include edge/corner detection — those are feature-extraction steps.
  • Incorrect: "RGB is only for camera images" (it applies to all colour images) and "fewer pixels resemble the original image better" (more pixels = more detail).

Quick Revision — Key Points to Remember

  • Computer Vision (CV) = AI field enabling machines to see, observe & understand visual data; a.k.a. Machine Vision.
  • Human vs CV analogy: Retina → Sensor · Optic Nerve → Data Bus · Visual Cortex → Model.
  • Why CV surpasses humans: speed · objectivity · continuity · accuracy · scalability.
  • Digital image = grid of pixels (picture elements); more pixels = higher resolution = finer detail.
  • Monochrome pixel: 0 (black) to 255 (white); 1 byte = 8 bits = 2⁸ = 256 values.
  • RGB colour model: 3 channels (Red · Green · Blue); each 0–255 → over 16 million colours.
  • CV Process — 5 Stages: Image Acquisition → Preprocessing → Feature Extraction → Detection/Segmentation → High-Level Processing.
  • Preprocessing techniques: Noise Reduction · Image Normalization · Resizing/Cropping · Histogram Equalization.
  • Feature Extraction: Edge Detection · Corner Detection · Texture Analysis · Colour-Based (CNN does it automatically in DL).
  • Single-object tasks: Classification (KNN supervised, K-means unsupervised) · Classification + Localization (bounding box).
  • Multi-object tasks: Object Detection (R-CNN · R-FCN · YOLO · SSD) · Image Segmentation (Semantic · Instance).
  • Classification vs Detection: Whole image → single class vs Multiple objects → bounding boxes + classes.
  • Applications: Facial recognition · Healthcare · Self-driving · OCR · Machine inspection · 3D modelling · Surveillance · Biometrics.
  • 4 Challenges: Reasoning · Image Acquisition difficulty · Privacy/Security · Duplicate/False content (deepfakes).
  • Future: Personalised healthcare · AR/VR · Smart cities · Agriculture · Education — with ethics at the core.
  • Activities: Binary Art (0s/1s re-form the image) · Website with ML model via Teachable Machine + Weebly.
  • OpenCV essentials: pip install opencv-python · cv2.imread · cv2.imshow · cv2.waitKey(0) · cv2.destroyAllWindows · cv2.resize · cv2.cvtColor(..., cv2.COLOR_BGR2GRAY).
🧠Practice Quiz — test yourself on this chapter