VM-LEARNING /class.xi ·track.ai ·ch-b7 session: 2026_27
$cd ..

~/Leveraging Linguistics and Computer Science

root@vm-learning ~ $ open ch-b7
PART B ▪ UNIT 7
12
Leveraging Linguistics and Computer Science
Language Complexity · NLP · Chatbots · 5 Phases · Applications
Combining the methods of computer science with linguistics improves machines' ability to understand and process human language. This multidisciplinary field — Natural Language Processing (NLP) — is what powers sentiment analysis, machine translation, search engines, voice assistants, chatbots, and more. Unit 7 explores how NLP handles the messiness of human language, the five phases of NLP processing, how chatbots are built, and real-world applications.
Learning Outcome: Understand the complexities of language & NLP challenges · Learn techniques and algorithms for NLP tasks

1.1 Understanding Human Language Complexity

Linguistics is the field of study that strategically applies linguistic principles — structure, semantics, pragmatics, sociolinguistics — to meet specific goals in marketing, advertising, education, or NLP. Understanding how language works lets you tailor messages, improve communication, and influence behaviour.

Human language is incredibly complex — full of strange expressions, metaphors that require cultural knowledge, grammatical structures that turn simple ideas into tongue-twisters. Machines need Natural Language Processing (NLP) to understand it.

1.2 Introduction to Natural Language Processing (NLP)

NLP is a branch of Artificial Intelligence that lets computers understand, create, and manipulate human speech. It works with both written text and spoken voice. Also known as "language-in". NLP powers virtual assistants like ODA, Siri, Cortana, Alexa, and tools such as Google Search, email spam filtering, auto-translation, document summarisation, sentiment analysis, grammar / spell-checking.

Structured vs Unstructured Data

📊 Structured
Arranged in tables with neatly labelled rows & columns — computers work with this easily.
📝 Unstructured
Messy, free-form text — human language! Machines struggle with it.

How NLP Handles "Messiness"

1Sentence Segmentation
Break the text one sentence at a time.
2Tokenisation
Split each sentence into small units (typically words) called tokens.
3Structure & Classify
Sort tokens into categories — entities, relationships, concepts.

1.3 Three Language Elements NLP Must Identify

🏷️
Entity
A noun — person, place, or thing. Not an adjective or verb.
Example: "elephant", "pajamas".
🔗
Relationship
A group of two or more entities with a strong connection.
Example: "I + shot" — who shot what.
💭
Concept
Something implied but not stated. Tricky — needs idea-matching not word-matching.
Example: "Safari" and "rifle" from the hunting joke.
Groucho Marx joke: "One morning I shot an elephant in my pajamas. How he got in my pajamas, I don't know."
Entities: morning, elephant, pajamas
Relationships: I + shot, I + pajamas, in + pajamas
Implied concepts: Safari, Rifle (never stated but understood).

1.4 Emotion Detection vs Sentiment Analysis

AspectEmotion DetectionSentiment Analysis
DefinitionIdentifies distinct human emotion types.Measures the strength / polarity of an emotion.
ExamplesDetermines if expression is anger, happiness, fear.Assesses data as positive, negative, or neutral.
Use CasesAnalysing user ratings, survey comments.Social-media posts, customer-service chats, product reviews.
AI TrainingClassifies emotions into categories.Uses a sliding scale between positive and negative.
PurposeIdentify emotional tokens to understand context.Assess overall tone of the text.

1.5 The Classification Problem — Language Ambiguity

Human language is full of terms with vague or multiple meanings. Deciding which meaning applies is called a classification problem. Humans resolve ambiguity easily from context; machines struggle.
Ambiguous riddle: "Why does your nose run and your feet smell?"
"Run" and "smell" have multiple meanings. Other confusing examples — "shipping a box by train", "filling in a form by filling it out".

How AI Solves the Classification Problem

1Supervised Learning
AI uses supervised ML on large labelled language datasets.
2Pattern Recognition
Learns patterns between words, phrases & their meanings.
3Incremental Improvement
With more data, the AI adjusts parameters and improves accuracy.
4Confidence Values
Never perfect — well-designed AI also returns a confidence score with each classification.

1.6 Chatbots

Chatbots are software applications that simulate conversation with humans — through text or voice. They use AI, NLP, and ML to understand user queries and reply. Integrated into websites, messaging apps, voice assistants.

Two Types of Chatbots

Aspect🧩 Rule-Based Chatbot🤖 AI-Powered Chatbot
How it worksPredefined rules & decision trees — responds to specific user input patterns.Uses NLP + ML algorithms. Also called "chat agents" / "virtual assistants".
AdvantagesEasy to develop and maintain. Consistent, accurate answers for specific questions.24/7 availability · Personalised interactions · Efficient & cost-saving.
LimitationsStruggles with complex language. Cannot adapt beyond programmed rules.High development cost · Bias from training data · Privacy & ethics concerns.
Use CasesCustomer service (FAQs, order status) · Guiding users through specific processes.Entertainment & Gaming · Finance & Banking · Healthcare assistants.

1.7 Structure of a Chatbot — Frontend and Backend

🖥️ Frontend
The messaging channel — user-facing interface. Receives input and displays replies.
Limitation: may lack contextual understanding beyond immediate input.
⚙️ Backend
Where the heavy lifting happens — application logic + memory for conversation state.
Remembers earlier parts of the conversation as dialogue continues.

1.8 Three Building Blocks — Intent · Entity · Dialog

🎯
Intent
A purpose / reason why the user is contacting the chatbot. Think of it as a verb.
Example: ask about operating hours, file a complaint.
🏷️
Entity
A noun — person, place, thing mentioned in the user input.
Example: "Bangalore office" → Bangalore is the entity.
💬
Dialog
An IF / THEN flowchart — map of each possible user input → chatbot response.
Conversation is stored as nodes, each with a statement + possible replies.
Restaurant-chain chatbot example:
Intent: "Open" — the user wants to know business hours.
Possible user inputs: "When do you open?" · "What are your hours?" · "You open now?" · "How late are you open?" · "Can I walk in at 7 pm?"
Entities: Bangalore (location), Schedule, Time.

1.9 The Five Phases of NLP

NLP processes human language through five sequential phases — each uses deep-neural-network ML techniques to mimic how the human brain understands language.

🔤1. LexicalBreak into words
📐2. SyntacticalCheck grammar
💡3. SemanticExtract meaning
🔗4. DiscourseUse context
🌍5. PragmaticReal-world intent

Phase 1 — Lexical Analysis (Tokenisation & Morphology)

Understand and examine the structure of words. Break text into paragraphs, phrases, words. Use:

Phase 2 — Syntactical Analysis (Parsing & Grammar)

Checks grammar, word order, word relationships. Uses dependency grammar and Part-of-Speech (POS) tags.

Example: "Mumbai travels to Anuj." — rejected by the syntactic analyser because it makes no sense grammatically / semantically.

Phase 3 — Semantic Analysis

Understands the meaning conveyed by a sentence in a clear, contextually appropriate manner. Extracts insights to comprehend intended messages.

Phase 4 — Discourse Integration

Understands a statement's meaning based on preceding sentences. Helps interpret pronouns and proper-noun references.

Example: "Arti wants it." — "it" depends on a previous sentence. Without that context, "it" is meaningless.

Phase 5 — Pragmatic Analysis

Studies meanings in a particular situation. Recognises how individuals communicate, the context, the speaker, and many other real-world factors — "who said what to whom".

1.10 Walkthrough Example — "The cat sat on the mat"

PhaseWhat happensResult
1. LexicalTokenisation + morphological analysis.Tokens: [The, cat, sat, on, the, mat]
2. SyntacticParse grammar; identify SVO structure.Subject: "The cat" · Verb: "sat" · Phrase: "on the mat"
3. SemanticMeaning extraction — individual words & logic.cat = animal · sat = action · mat = object
4. DiscourseExtract context from surrounding sentences.If prior sentence was "It was raining outside" → model infers cat sat inside to avoid rain.
5. PragmaticReal-world interpretation based on speaker & context.Casual chat = description of cat's action · Pet-query chatbot = suggest comfortable pet mat.

1.11 Applications of NLP

Businesses use NLP to analyse data, discover insights, automate operations, and gain competitive advantage.

😊
Sentiment AnalysisEvaluate consumer comments, social posts, reviews — gauge positive/negative/neutral sentiment.
🎙️
Voice AssistantsSiri · Alexa · Google Assistant · ODA · Cortana. Calls, reminders, meetings, alarms, web browsing.
📧
Email FilteringClassifies incoming emails as "important" or "spam" and sorts accordingly.
📄
Document AnalysisClassify and categorise large data inflows in schools, companies, institutions — aids claim/risk decisions.
📝
Automatic SummarisationSummarise information efficiently — even identifies hidden emotional meaning in text.
🔍
Grammar/Spell CheckGrammarly, built-in Word tools, predictive text on keyboards.

1.12 Practical — POS Tagging with NLTK

Practical (Syllabus - Advanced): Write a Python program to print the Part-of-Speech (POS) tags of a statement.
pip install nltk

import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

nltk.download("punkt")
nltk.download("averaged_perceptron_tagger")

sentence = "The quick brown fox jumps over the lazy dog."

# Tokenise
words = word_tokenize(sentence)

# Tag each word with its POS
pos_tags = pos_tag(words)

for word, pos in pos_tags:
    print(f"Word: {word}, POS: {pos}")
Word: The, POS: DT Word: quick, POS: JJ Word: brown, POS: NN Word: fox, POS: NN Word: jumps, POS: VBZ Word: over, POS: IN Word: the, POS: DT Word: lazy, POS: JJ Word: dog, POS: NN Word: ., POS: .

Common POS Tag Meanings

TagMeaningExample
DTDeterminerthe, a, an
NNNoun (singular)dog, fox, mat
JJAdjectivequick, brown, lazy
VB / VBZVerb / 3rd-person singular verbjumps, runs
INPreposition / conjunctionover, in, on

1.13 Practical — Simple Rule-Based Chatbot in Python

Practical (Syllabus - Advanced): Create a simple rule-based chatbot using Python.
import random

def get_response(user_input):
    user_input = user_input.lower()
    if "hello" in user_input:
        return "Hi there! How can I assist you?"
    elif "how are you" in user_input:
        return "I'm just a bot, but thanks for asking!"
    elif "bye" in user_input:
        return "Goodbye! Have a great day!"
    else:
        return "I'm sorry, I didn't understand that."

def main():
    print("Welcome to the Simple Chatbot!")
    print("Type 'bye' to exit.")

    while True:
        user_input = input("You: ")
        if user_input.lower() == "bye":
            print("Chatbot: Goodbye! Have a great day!")
            break
        else:
            response = get_response(user_input)
            print("Chatbot:", response)

if __name__ == "__main__":
    main()
Welcome to the Simple Chatbot! Type 'bye' to exit. You: hello good afternoon Chatbot: Hi there! How can I assist you? You: how are you Chatbot: I'm just a bot, but thanks for asking! You: bye Chatbot: Goodbye! Have a great day!

1.14 Activities & Certification (Syllabus)

From the syllabus practical list:
  • Write an article on "IBM Project Debater — Interesting Facts".
  • Create an ice-cream ordering chatbot using any of:
    • Google Dialogflow
    • Botsify.com
    • Botpress.com
  • Program to print POS tags (§1.12 above) — Advanced Learners.
  • Simple rule-based chatbot in Python (§1.13 above) — Advanced Learners.
  • Earn a credential on IBM SkillsBuild — Natural Language Processing.
Fun AI experiment: Try Verse by Verse at sites.research.google/versebyverse — an experimental AI-powered muse that helps you write poetry inspired by classic American poets.

Quick Revision — Key Points to Remember

  • Linguistics studies language structure, semantics, pragmatics, sociolinguistics.
  • NLP = branch of AI that lets computers understand, create, and manipulate human language. Works on text + speech. Also called "language-in".
  • NLP examples: Siri, Alexa, Cortana, Google Search, email spam filtering, auto-translate, document summarisation, sentiment analysis, grammar checking.
  • Why hard: human language is unstructured — ambiguous, metaphoric, cultural.
  • 3 NLP handling steps: Sentence Segmentation · Tokenisation · Structure & Classify.
  • 3 elements NLP identifies: Entity (noun) · Relationship (2+ entities linked) · Concept (implied meaning).
  • Emotion Detection = which emotion type · Sentiment Analysis = how strong/polarity (positive/negative/neutral).
  • Classification problem = ambiguous/multi-meaning words. Solved by supervised ML + confidence scores.
  • Chatbots = software that simulates conversation. Two types: Rule-based (rigid, consistent) and AI-powered (flexible, 24/7).
  • Chatbot structure: Frontend (UI) + Backend (logic + memory).
  • 3 building blocks: Intent (purpose - verb) · Entity (person/place/thing - noun) · Dialog (IF/THEN flowchart of nodes).
  • 5 phases of NLP: Lexical (tokens) · Syntactical (grammar) · Semantic (meaning) · Discourse (context) · Pragmatic (real-world interpretation).
  • Lexical techniques: Stemming (root form) · Lemmatization (dictionary form with POS).
  • 6 applications: Sentiment analysis · Voice assistants · Email filtering · Document analysis · Automatic summarisation · Grammar/spell checking.
  • Python libraries: NLTK (Natural Language Toolkit) — word_tokenize, pos_tag.
  • Common POS tags: NN (noun) · JJ (adjective) · VB/VBZ (verb) · DT (determiner) · IN (preposition).
  • Chatbot platforms: Google Dialogflow · Botsify · Botpress.
  • Certification: IBM SkillsBuild — Natural Language Processing.
🧠Practice Quiz — test yourself on this chapter