VM-LEARNING /class.xi ·track.cs ·ch-1-6 session: 2026_27
$cd ..

~/Encoding Schemes

root@vm-learning ~ $ open ch-1-6
UNIT 1 ▪ CHAPTER 6
06
Encoding Schemes
ASCII · ISCII · Unicode · UTF-8 · UTF-32
Learning Outcome 1: Explain encoding schemes — ASCII, ISCII and Unicode (UTF-8 / UTF-32).

6.1 Why Encoding?

A computer can store numbers directly in binary, but what about letters like A or , or a smiley emoji? Each such character must first be assigned a unique number (called its code point); then the computer simply stores that number in binary. The rule that maps every character to a number is called a character encoding scheme.

How the letter "A" is stored: Character A Code point 65 (ASCII) Binary stored 01000001 encoding binary Every keystroke goes through this chain before hitting the disk.

6.2 ASCII — American Standard Code for Information Interchange

CharacterASCII (decimal)ASCII (binary, 8-bit)
A6501000001
B6601000010
Z9001011010
a9701100001
z12201111010
04800110000
95700111001
Space3200100000
!3300100001
Enter (CR)1300001101
Handy facts:
  • Upper-case letters start at 65 (A)90 (Z).
  • Lower-case letters start at 97 (a)122 (z).
  • Difference between upper- and lower-case of the same letter is exactly 32.
  • Digits '0''9' start at 4857. The digit character '5' is not the number 5 — it is 53!

6.3 ISCII — Indian Script Code for Information Interchange

6.4 Unicode — One Code for Every Character in the World

By the 1990s, with the Internet spreading globally, each country had its own encoding (Shift-JIS for Japanese, GB for Chinese, ISCII for Indian, etc.). This made sharing documents across regions a nightmare. Unicode was designed to end the chaos: one universal code table for every character of every living script, plus historical scripts, mathematical symbols and even emojis.

6.4.1 UTF-8 vs UTF-32

A code point is a number; UTF-8 and UTF-32 are two different ways of encoding that number as bytes on disk.

SchemeBytes per characterASCII-compatible?ProsCons
UTF-8Variable: 1 – 4 bytesYes — ASCII fits in 1 byte unchangedSpace-efficient for English text; dominant on the Web & in files.Slightly more work to find the n-th character (variable length).
UTF-32Fixed: always 4 bytesNo — every ASCII char takes 4 bytesEvery character is the same size — easy to index.Wastes space for English-heavy text.
Same word, different encodings. The word "Hi" (2 characters):
ASCII       :  48 69                         (2 bytes)
UTF-8       :  48 69                         (2 bytes, same as ASCII)
UTF-32      :  00 00 00 48  00 00 00 69      (8 bytes)
The word "नमस्ते" cannot be written in ASCII at all; UTF-8 stores it in about 12–18 bytes, UTF-32 in exactly 24 bytes.
Why UTF-8 won the Web: it is backward compatible with ASCII, keeps English text tiny, and still represents every character on Earth. Today > 98% of all web pages are served in UTF-8. When you save a Python .py file in VS Code, it is UTF-8 by default.

📌 Quick Revision — Chapter 6 at a Glance

  • Encoding = rule mapping every character to a unique number (code point).
  • ASCII (1963) — 7-bit / 128 codes; extended 8-bit / 256. 'A' = 65, 'a' = 97, diff = 32.
  • ISCII — 8-bit Indian standard (BIS, 1991). Lower 128 = ASCII; upper 128 = Indian scripts.
  • Unicode — one universal table, code points written U+HHHH. Over 1,40,000 characters.
  • UTF-8 — variable 1–4 bytes, ASCII-compatible, over 98% of the Web.
  • UTF-32 — fixed 4 bytes per character — easy to index, wasteful for English text.
Unit 2
🐍
Computational Thinking
& Programming – I
Problem-solving · Algorithms · Python Programming
Ch 7 • Problem Solving  •  Ch 8 • Python Basics  •  Ch 9 • Data Types  •  Ch 10 • Operators
Ch 11 • Expressions & I/O  •  Ch 12 • Errors  •  Ch 13 • Flow of Control
Ch 14 • Conditionals  •  Ch 15 • Loops  •  Ch 16 • Strings
Ch 17 • Lists  •  Ch 18 • Tuples  •  Ch 19 • Dictionaries  •  Ch 20 • Modules
🧠Practice Quiz — test yourself on this chapter