PART B ▪ UNIT 3 · Making Machines See

  root@vm-learning
  ~
  $
  open
  ch-b3
  

PART B ▪ UNIT 3

Making Machines See

Computer Vision · Pixels · Process Stages · Applications · OpenCV

Computer Vision (CV) — a field of AI that enables systems to see, observe and understand the visual world. It uses sensing devices and deep-learning models to derive meaningful information from digital images, videos and other visual inputs, and to make recommendations or take actions accordingly. Also called Machine Vision.

Introduction — Why Machines Need to See

With social media (Facebook, Instagram, Twitter) and smartphone cameras, billions of images and videos are shared every day. Unlike text, indexing and searching images is hard — algorithms must organise image data by colour, texture, shape or metadata for quick retrieval. Traditionally, images relied on manual meta-descriptions; today we need computers to visually perceive images themselves.

Just as children are taught to associate an image (e.g., an apple) with the letter "A", computers must develop similar capabilities by repeatedly viewing labelled images.

Key Concepts You'll Learn

Introduction to Computer Vision
Working of Computer Vision
Applications of Computer Vision
Challenges of Computer Vision
The Future of Computer Vision

Prerequisite: Basic understanding of digital imaging and knowledge of machine learning.

Learning Outcome 1: Explain the concept of computer vision and its significance in analysing visual data

1.1 How Machines See?

Computer Vision is analogous to human vision:

Human Vision	Computer Vision
Retina	Camera sensor
Optic nerve	Data bus
Visual cortex	Deep-learning model / algorithm

CV systems inspect products, infrastructure or production assets in real time — noticing defects or issues faster than humans. Due to speed, objectivity, continuity, accuracy and scalability, they often surpass human capabilities. Modern deep-learning models achieve above-human accuracy on tasks like facial recognition, object detection and image classification.

Learning Outcome 2: Demonstrate an understanding of the key stages involved in the computer vision process and their roles in interpreting images and videos

2.1 Working of Computer Vision — Basics of Digital Images

Digital Image — a picture stored on a computer as a sequence of numbers. Created by design software (Paint, Photoshop), digital cameras, or scanners.

2.2 Interpretation of an Image in Digital Form

Pixel = "Picture Element" — the smallest square in a digital image, each representing a specific colour value.

During digitisation, an image is converted into a grid of pixels.
Resolution = number of pixels in the image. Higher resolution → finer detail, closer to the original scene.
For monochrome (black & white) images, each pixel's value ranges from 0 (black) to 255 (white).
1 byte = 8 bits → 2⁸ = 256 distinct values (0–255).
For coloured images, each pixel has 3 numbers based on the RGB (Red · Green · Blue) colour model — each channel 0–255 → over 16 million possible colours per pixel.

Binary-to-Decimal Quick Reference

Binary 00000000 = 2⁷·0 + 2⁶·0 + … + 2⁰·0 = 0 (black).
Binary 11111111 = 2⁷·1 + 2⁶·1 + … + 2⁰·1 = 255 (white).

Activity 3.1 — Binary Art: Recreating Images with 0s and 1s. (1) Choose any image from Pixabay / Unsplash / Pexels. (2) Resize to 200–300 px using imageresizer.com. (3) Convert to grayscale with pinetools.com. (4) Extract pixel values with Boxentriq Pixel Value Extractor. (5) Copy all values → paste into a Word / Google Doc. (6) Select all and set font size to 1. (7) Observe the original image re-emerge from 0s and 1s!

2.3 Computer Vision Process — The 5 Stages

Stage 1 · Image Acquisition

The initial stage — capturing digital images or videos. This is the raw data for subsequent analysis.

Sources: digital cameras, scanners (for physical photos/documents), design-software generation.
Quality depends on device capability & resolution, lighting conditions and angle.
Specialised scientific imaging: MRI (Magnetic Resonance Imaging), CT (Computed Tomography) scans for medical diagnosis.

Stage 2 · Preprocessing

Aims to enhance the quality of the acquired image. Main goals: remove noise · highlight important features · ensure consistency across the dataset.

Noise Reduction — removes blurriness, random spots, distortions. Example: removing grainy effects in low-light photos.
Image Normalization — standardises pixel values to a consistent range (e.g., 0–1 or −1 to +1). Example: scaling 0–255 → 0–1.
Resizing / Cropping — makes all images uniform. Example: resize all inputs to 224 × 224 pixels before a neural network.
Histogram Equalization — adjusts brightness & contrast by spreading pixel intensities evenly, enhancing detail in dark or bright regions.

Stage 3 · Feature Extraction

Identifies and extracts relevant visual patterns or attributes. Common techniques:

Edge Detection — identifies boundaries where there is a significant change in intensity.
Corner Detection — identifies points where two or more edges meet (high-curvature areas).
Texture Analysis — extracts features like smoothness, roughness or repetition.
Colour-Based Feature Extraction — quantifies colour distributions to discriminate objects/regions.

In deep-learning approaches, feature extraction is performed automatically by Convolutional Neural Networks (CNNs) during training.

Stage 4 · Detection / Segmentation

Identifies objects or regions of interest. Split into two broad categories:

Single-Object Tasks

Classification — determines the category/class of one object. Algorithms: KNN (K-Nearest Neighbour) for supervised, K-means Clustering for unsupervised.
Classification + Localization — also draws a bounding box tightly around the object.

Multiple-Object Tasks

Object Detection — identifies and locates multiple objects; draws bounding boxes with class labels. Algorithms: R-CNN (Region-Based CNN), R-FCN (Region-Based Fully Convolutional Network), YOLO (You Only Look Once), SSD (Single Shot Detector).
Image Segmentation — creates a pixel-wise mask for each object. Uses edge-detection to find discontinuities in brightness. Two popular kinds:
- Semantic Segmentation — classifies pixels by class; objects of the same class are not differentiated (e.g., all "animals" get one mask).
- Instance Segmentation — differentiates every object even if they share a class (each animal separately masked).

Key Difference: Classification vs Detection

Classification considers the whole image and predicts one class. Detection identifies multiple objects in an image and classifies each.

Stage 5 · High-Level Processing

The final stage — interpreting and extracting meaningful information. Tasks include recognising objects, understanding scenes and analysing context. Empowers CV systems to extract valuable insights and drive intelligent decision-making in domains from autonomous driving to medical diagnostics.

Learning Outcome 3: Identify real-world applications of computer vision technology in various industries and understand how it enhances efficiency and productivity

3.1 Applications of Computer Vision

Facial recognition — Facebook uses it to detect and tag users in photos.
Healthcare — evaluates cancerous tumours, identifies diseases and abnormalities, tracks objects in medical imaging (MRI, CT, X-ray).
Self-Driving Vehicles — capture video around the car to detect other vehicles, traffic signals, pedestrian paths.
Optical Character Recognition (OCR) — extracts printed or handwritten text from images (invoices, bills, articles).
Machine Inspection — detects defects, functional flaws, and irregularities in manufactured products using tuned lighting and handling.
3D Model Building — constructs 3D models of objects for robotics, autonomous driving, 3D tracking, scene reconstruction, AR/VR.
Surveillance — live CCTV analysis to identify suspicious behaviour and dangerous objects; maintains law and order.
Fingerprint Recognition & Biometrics — validates user identity for banking, immigration, attendance.

Learning Outcome 4: Evaluate the ethical implications and challenges associated with computer vision, including privacy concerns and the spread of misinformation

4.1 Challenges of Computer Vision

1. Reasoning and Analytical Issues

CV relies on more than just image identification — it requires accurate interpretation. Without robust reasoning, extracting meaningful insights is limited.

2. Difficulty in Image Acquisition

Hindered by lighting variations, perspectives, scales, occlusions and complex multi-object scenes. Obtaining high-quality data amid these challenges is crucial.

3. Privacy & Security Concerns

Vision-powered surveillance can infringe on privacy rights. Facial recognition raises ethical dilemmas — regulatory scrutiny and public debate surround such technologies.

4. Duplicate & False Content

Malicious actors can create misleading or fraudulent content (deepfakes, forged images). Data breaches foster misinformation and reputational damage.

Learning Outcome 5: Envision the future possibilities of computer vision technology

5.1 The Future of Computer Vision

CV has evolved from basic image processing to systems that understand and interpret visual data with human-like precision. Breakthroughs in deep learning and the availability of vast labelled training datasets have propelled the field forward.

What's Next?

Personalised Healthcare Diagnostics — CV-powered tools for early disease detection.
Immersive AR / VR — real-time scene understanding for headsets and smart glasses.
Smart Cities — traffic monitoring, waste sorting, infrastructure inspection.
Agriculture — crop health analysis, pest detection, yield prediction from drone imagery.
Education — automatic answer-sheet checking, engagement analysis.

By embracing innovation, fostering collaboration and prioritising ethics, we can harness the transformative power of CV for humanity.

Activity 3.2 — Creating a Website Containing an ML Model (Teachable Machine + Weebly). (1) Visit teachablemachine.withgoogle.com → Get Started. (2) Choose Image project → Standard Image Model. (3) Add Class 1 = "Kittens" with uploaded photos, Class 2 = "Puppies". (4) Click Train Model. (5) Test with webcam or uploaded image. (6) Click Export Model → Upload my model, then Copy the JavaScript snippet. (7) Paste into Notepad and save as web.html. (8) Create a free account on Weebly, add an Embed Code block and paste the JavaScript. (9) Click Publish on the Weebly subdomain. (10) Open the URL and test your ML-powered website!

Learning Outcome 6 (for Advanced Learners): Develop basic skills in using OpenCV and deploying machine learning models online

6.1 Introduction to OpenCV

OpenCV (Open-Source Computer Vision Library) — a cross-platform library for developing real-time computer vision applications. Focuses on image processing, video capture and analysis, with features like face detection, object detection, and handwriting recognition.

Installing OpenCV

pip install opencv-python

6.2 Loading & Displaying an Image

import cv2

image = cv2.imread('example.jpg')   # load image
cv2.imshow('original image', image) # display image
cv2.waitKey(0)                      # wait for any key press
cv2.destroyAllWindows()             # close all OpenCV windows

Function Reference

cv2.imread('path') — loads an image into a NumPy array.
cv2.imshow('title', image) — opens a window displaying the image.
cv2.waitKey(0) — waits indefinitely for a key press (use a positive number for milliseconds).
cv2.destroyAllWindows() — closes any OpenCV-created windows.

6.3 Resizing an Image

import cv2

image = cv2.imread('example.jpg')
new_width  = 300
new_height = 300
resized_image = cv2.resize(image, (new_width, new_height))

cv2.imshow('Resized Image', resized_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Use cv2.resize(image, (width, height)) to set fixed dimensions. Common use: standardise all inputs (e.g., 300 × 300 or 224 × 224) before feeding them into a model.

6.4 Converting an Image to Grayscale

Grayscale images reduce computational complexity by removing the three colour channels.

import cv2

image = cv2.imread('example.jpg')
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

cv2.imshow('Grayscale Image', grayscale_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Use cv2.cvtColor(src, cv2.COLOR_BGR2GRAY) — the colour-conversion code cv2.COLOR_BGR2GRAY converts BGR → grayscale.

Check Your Progress — quick MCQ pointers:

Field that helps computers "see" → Computer Vision.
Task of assigning a class label to an input image → Image Classification.
1 byte = 8 bits. Monochrome pixel range: 0–255.
Capturing a digital image → Image Acquisition.
KNN is used for supervised classification; K-means for unsupervised.
A computer sees an image as a series of pixels.
High-level processing drives intelligent decision-making.
Edge detection identifies abrupt intensity changes / object boundaries.
Preprocessing does not include edge/corner detection — those are feature-extraction steps.
Incorrect: "RGB is only for camera images" (it applies to all colour images) and "fewer pixels resemble the original image better" (more pixels = more detail).

Quick Revision — Key Points to Remember

Computer Vision (CV) = AI field enabling machines to see, observe & understand visual data; a.k.a. Machine Vision.
Human vs CV analogy: Retina → Sensor · Optic Nerve → Data Bus · Visual Cortex → Model.
Why CV surpasses humans: speed · objectivity · continuity · accuracy · scalability.
Digital image = grid of pixels (picture elements); more pixels = higher resolution = finer detail.
Monochrome pixel: 0 (black) to 255 (white); 1 byte = 8 bits = 2⁸ = 256 values.
RGB colour model: 3 channels (Red · Green · Blue); each 0–255 → over 16 million colours.
CV Process — 5 Stages: Image Acquisition → Preprocessing → Feature Extraction → Detection/Segmentation → High-Level Processing.
Preprocessing techniques: Noise Reduction · Image Normalization · Resizing/Cropping · Histogram Equalization.
Feature Extraction: Edge Detection · Corner Detection · Texture Analysis · Colour-Based (CNN does it automatically in DL).
Single-object tasks: Classification (KNN supervised, K-means unsupervised) · Classification + Localization (bounding box).
Multi-object tasks: Object Detection (R-CNN · R-FCN · YOLO · SSD) · Image Segmentation (Semantic · Instance).
Classification vs Detection: Whole image → single class vs Multiple objects → bounding boxes + classes.
Applications: Facial recognition · Healthcare · Self-driving · OCR · Machine inspection · 3D modelling · Surveillance · Biometrics.
4 Challenges: Reasoning · Image Acquisition difficulty · Privacy/Security · Duplicate/False content (deepfakes).
Future: Personalised healthcare · AR/VR · Smart cities · Agriculture · Education — with ethics at the core.
Activities: Binary Art (0s/1s re-form the image) · Website with ML model via Teachable Machine + Weebly.
OpenCV essentials: pip install opencv-python · cv2.imread · cv2.imshow · cv2.waitKey(0) · cv2.destroyAllWindows · cv2.resize · cv2.cvtColor(..., cv2.COLOR_BGR2GRAY).

Practice Quiz — test yourself on this chapter→