PART B ▪ UNIT 7 · Generative AI

  root@vm-learning
  ~
  $
  open
  ch-b7
  

PART B ▪ UNIT 7

Generative AI

GANs · VAEs · LLMs · Applications · Ethics · Gemini Chatbot

Generative AI — a branch of Artificial Intelligence that creates new content (audio, text, images, video) resembling its training samples. It uses machine-learning algorithms to learn from existing datasets. Examples: ChatGPT · Gemini · Claude · DALL-E.

Introduction — What Is Generative AI?

We have all seen celebrities targeted with fake images and people claiming AI-generated content as their own. This unit explores a new dimension of AI — Generative AI — and the principles behind all these tools.

Key Concepts You'll Learn

Introduction to Generative AI
Working of Generative AI
Generative and Discriminative models
Applications of Generative AI
LLM — Large Language Model
Future of Generative AI
Ethical and Social Implications of Generative AI

Prerequisites: AI concepts from Class XI + basic Python (installing/importing packages).

Learning Outcome 1: Articulate the principles behind Generative AI, including how these models are trained and how they generate new content

1.1 Working of Generative AI

Generative AI learns patterns from data and autonomously generates similar samples. It operates within deep learning, using neural networks to understand intricate patterns. Two key model families:

1. Generative Adversarial Networks (GANs)

A neural-network architecture with two networks that compete:

Generator — creates new data samples (images, text) — the "fake".
Discriminator — evaluates samples to distinguish real from fake.

Through adversarial training, GANs learn to produce samples indistinguishable from real data. Applied in image generation, style transfer, data augmentation.

2. Variational Autoencoders (VAEs)

Unique programs with two parts:

Encoder — converts data into a hidden latent space (compressed representation).
Decoder — translates information back from the latent space into its original form.

VAEs focus on capturing underlying patterns. Applications: data generation · anomaly detection · filling in missing information.

GANs vs VAEs — Which to Choose?

GANs — excellent for visually realistic outputs.
VAEs — better for structured data generation and tasks requiring interpretable latent spaces.

Learning Outcome 2: Demonstrate the ability to use Generative AI tools for creative and analytical purposes — generating images, text, audio and video

2.1 Applications of Generative AI — 4 Content Types

1. Image Generation

Computers produce new pictures resembling ones they have seen. Analyse characteristics of input images and generate new ones with similar features.

Examples: Canva · DALL-E · Stability AI · Stable Diffusion.

2. Text Generation

Computers write sentences that sound human-written. Analyse text to produce coherent, contextually relevant text.

Examples: OpenAI's ChatGPT · Perplexity · Google's Bard (Gemini).

3. Video Generation

Create new videos by learning from existing ones — animations, visual effects, realistic visuals.

Examples: Google's Lumiere · Deepfake algorithms.

4. Audio Generation

Generate fresh audio content — music, sound effects, speech — inspired by existing recordings.

Examples: Meta AI's Voicebox · Google's Music LM (generates music from text).

Learning Outcome 3: Evaluate the use cases of Generative AI, distinguishing between its capabilities and those of discriminative models

3.1 Two Broad Model Categories

Discriminative models — define class boundaries in data; suited for classification. Learn features that differentiate classes (e.g., spam vs not-spam emails).
Generative models — comprehend the underlying data distribution and generate new samples.

3.2 Generative AI vs Discriminative AI — Comparison

Aspect	Generative AI	Discriminative AI
Purpose	Creates new things (images, stories); finds unusual things; learns without precise guidance.	Determines what something is by looking at features; good at telling things apart.
Models	Networks that compete (GAN) or guess from patterns to create new things.	Finds rules to separate things — e.g., dog vs cat.
Training Focus	Understands what makes data unique; creates new data similar-but-different.	Learns to draw lines / rules that tell things apart by features.
Applications	New artworks · new story ideas · anomaly detection in data.	Facial recognition · speech recognition · spam classification.
Example Algorithms	Naïve Bayes · Gaussian discriminant analysis · GAN · VAE · LLM · DBM · Autoregressive models	Logistic Regression · Decision Trees · SVM · Random Forest

Learning Outcome 4: Examine the ethical, social and legal implications of Generative AI — including deep fakes, copyright and data privacy

4.1 Ethical and Social Implications of Generative AI

1. Deepfake Technology

Tools like DeepFaceLab and FaceSwap raise concerns about the authenticity of digital content. Deepfake algorithms can generate compelling fake images/audio/videos — jeopardising trust in media and spreading misinformation.

DeepArt's style-transfer algorithms can seamlessly manipulate visuals. Deepfake videos have been used to superimpose individuals' faces onto content without consent — leading to privacy violations and reputational damage.

2. Bias and Discrimination

Generative AI models (e.g., Clearview AI's facial-recognition algorithms) have shown biases that disproportionately affect certain demographic groups — perpetuating inequalities and stereotypes.

The AI-powered hiring platform HireVue has faced criticism for recruitment bias — using historical hiring data may inadvertently discriminate against under-represented groups, hindering diversity and inclusivity.

3. Plagiarism

Presenting AI-generated content as one's own (intentionally or not) raises ethical questions about intellectual property and academic integrity. If AI output significantly resembles copyrighted material, it could infringe copyright law.

4. Transparency

Disclosing the use of AI-generated content is essential — especially in academic and professional settings — to uphold ethical standards and prevent academic dishonesty. Failing to disclose AI use erodes trust and credibility.

4.2 Points to Remember + Citing AI Sources

Be cautious and transparent when using Generative AI.
Respect copyright — do not present AI output as your own.
Consult your teacher/institution for specific guidelines.

Citation Rules

Intellectual Property — attribute AI-generated content properly; comply with copyright laws.
Accuracy — verify AI-generated info; cite primary sources wherever possible.
Ethical Use — acknowledge AI tools; provide context for generated content.

Citation Example (APA Style)

Treat the AI as author — cite the tool name (e.g., Bard) + "Generative AI tool" in the author spot.
Date it right — use the date you received the AI-generated content.
Show your prompt — briefly mention the prompt given (optional).

Bard (Generative AI tool). (2024, February 20). How to cite generative AI in APA style.

Note: Generative AI is a form of Weak AI — it performs specific tasks and does not possess general human-level intelligence.

Learning Outcome 5: Use Generative AI technologies to create novel content — applying ethical guidelines and critical thinking

5.1 Hands-on Exploration — 5 Generative AI Tools

Activity 1 · Canva for Education (Text → Image)

Visit Canva for Education, sign up with your school email.
Verify the email, log in.
Open the Text to Image tool in the design-tools menu.
Type a prompt — e.g., "A red balloon floating in a clear blue sky."
Click Generate Image. Canva AI produces multiple options.
Customise (colours, shapes, backgrounds) and download, or use inside a Canva design.

Activity 2 · Google Gemini (Text Generation)

Sign in to your Google account.
Navigate to the Google Gemini platform.
Agree to the terms of service + privacy policy.
Craft a descriptive, specific prompt.
Click the text-generation option, wait for Gemini to process and review output.

Activity 3 · Veed AI (Text & Music)

Visit the Veed website → sign up.
Open the AI Text Generator.
Enter a concise prompt — e.g., "Write a short story about a lost astronaut searching for their way home."
Click Generate Text and review.

Activity 4 · Animaker (Video Generation)

Visit animaker.com → sign up.
Open the AI Video Generation tool.
Enter a detailed prompt — e.g., "Create a promotional video for a new product launch with animated characters and dynamic visuals."
Click Generate Video and customise.

Activity 5 · ChatGPT

Visit chat.openai.com → log in → start prompting for text, code, explanations or creative writing.

Learning Outcome 6: Create a chatbot using the Gemini API

6.1 Large Language Model (LLM) — Definition

LLM (Large Language Model) — a deep-learning algorithm that can perform a variety of Natural Language Processing (NLP) tasks: generating & classifying text, conversational question-answering, translating languages, and more.

LLMs are called "large" because they are trained on massive datasets of text and code — often trillions of words. Dataset quality directly affects model performance.

Transformers in LLMs

Transformers — a neural-network architecture that revolutionised NLP and is the engine behind modern LLMs. They enable efficient learning of complex language patterns and long-range relationships in text data.

6.2 Leading LLMs in 2024

OpenAI's GPT-4o — multi-modal, excels at text + images.
Google's Gemini 1.5 Pro — seamless multi-modal text/image/speech.
Meta's LLaMA 3.1 — open-source, high efficiency on diverse AI tasks.
Anthropic's Claude 3.5 — prioritises safety and interpretability.
Mistral AI's Mixtral 8 × 7B — sparse mixture of experts for superior performance with smaller sizes.

6.3 Applications of LLMs

Text Generation — content creation, dialogue, story/poetry generation, translating natural language → working code, auto-completion for writing & email.
Audio Generation — LLMs feed Text-to-Speech (TTS) systems to synthesise natural-sounding speech.
Image Generation — LLMs do image captioning (generate textual descriptions for images); enhances accessibility.
Video Generation — generate subtitles, captions, scene summaries; improves searchability.

6.4 Limitations of LLMs

Processing text needs significant compute → high response time and cost.
LLMs prioritise natural language over accuracy → can generate factually incorrect or misleading information with high confidence.
May memorise specifics rather than generalise → poor adaptability.

6.5 Risks of LLMs

Trained on Internet text → may exhibit biases; data-privacy concerns when personal information is processed.
Sensitive data in training can inadvertently reveal confidential information.
Crafted inputs (prompt injection) can lead to harmful or illogical outputs.

6.6 Case Study — LLaMA (Meta AI)

Publicly trained on text & code scraped from the internet — fosters transparency and wider accessibility.
Efficient training techniques require less compute — better scalability.
Multi-model design — sizes from 7 billion to 65 billion parameters; pick what suits your task + hardware.
Smaller models for everyday tasks; larger models for complex NLP.
Delivers impressive results on text summarisation & QA — competitive with some much larger proprietary LLMs.
Applications: personalised learning, content creation, research assistance.

6.7 Practical — Create a Chatbot with Gemini API

Large e-commerce companies like Amazon use AI chatbots for instant customer queries. Beyond customer service, custom chatbots power education (tutoring), research (data analysis), healthcare (appointment booking), and general-public tools.

Step 1 · Obtain an API Key

Visit aistudio.google.com → click Get API Key.
Click Create API Key.
Copy the key and keep it safe.

Step 2 · Set Up Python Environment

Install the Gemini SDK (the google-generativeai package):

pip install -q -U google-generativeai

Step 3 · Import Libraries

import google.generativeai as genai

Step 4 · Initialise Gemini with Your Key

GOOGLE_API_KEY = "YOUR_API_KEY_HERE"
genai.configure(api_key=GOOGLE_API_KEY)

Step 5 · Choose a Gemini Model

gemini-pro — optimised for text-only prompts.
gemini-pro-vision — optimised for text-and-image prompts.

model = genai.GenerativeModel('gemini-pro')

Step 6 · Create a Chat Session

chat = model.start_chat(history=[])

while True:
    user_input = input("You: ")
    if user_input.lower() in ['exit', 'quit']:
        break
    response = chat.send_message(user_input)
    print("Gemini:", response.text)

6.8 Future of Generative AI

Evolving architectures to surpass current capabilities.
Ethical development — minimise biases, ensure responsible use.
Addressing complex challenges in healthcare, education, multilingual translation.
Expansion in multimedia content creation.
Deeper human-AI collaboration — AI as a supportive partner across domains.

Case Study — Creative Horizons marketing agency uses Gen AI for an eco-friendly product campaign. Discuss: (1) what types of AI models they use, (2) how Gen AI personalises email content, (3) ethical considerations for AI in advertising, (4) benefits for dynamic video ads, (5) ensuring originality and copyright compliance.

Check Your Progress — quick MCQ pointers:

Primary objective of Generative AI → generate new data resembling its training samples.
Common Gen-AI algorithms → GANs & VAEs.
Purpose of Gen AI in text → create coherent, contextually relevant text from prompts.
Gen-AI model famous for text → OpenAI's ChatGPT.
Image generation analogy → computers make new pictures based on patterns they have learned.
Not a Gen-AI application → analysing sentiment in social media (that's discriminative).
Gen vs Disc training focus → Gen learns the data distribution; Disc identifies class boundaries.
Tech raising authenticity concerns → Deepfake AI.

Quick Revision — Key Points to Remember

Generative AI creates new content (text · image · audio · video) resembling its training samples.
Famous tools: ChatGPT · Gemini · Claude · DALL-E.
Two core architectures: GAN (Generator + Discriminator, adversarial) · VAE (Encoder + Decoder, latent space).
GAN vs VAE: GAN = realistic visuals; VAE = structured data + interpretable latent space.
4 application domains: Image (DALL-E, Canva, Stable Diffusion) · Text (ChatGPT, Gemini, Perplexity) · Video (Lumiere, Deepfake) · Audio (Voicebox, Music LM).
Generative vs Discriminative: Gen creates new data + learns distribution; Disc classifies + learns boundaries. Example Disc algos: Logistic Regression, Decision Trees, SVM, Random Forest. Example Gen algos: Naïve Bayes, GAN, VAE, LLM, DBM, Autoregressive.
LLM = deep-learning model on trillions of words; core architecture = Transformer.
Leading LLMs (2024): GPT-4o · Gemini 1.5 Pro · LLaMA 3.1 · Claude 3.5 · Mixtral 8×7B.
LLM limitations: high compute · factual errors · memorisation over generalisation.
LLM risks: bias · privacy leaks · prompt injection.
LLaMA case study: publicly trained · efficient · sizes 7B–65B params.
4 Ethical concerns: Deepfake (DeepFaceLab, FaceSwap) · Bias (Clearview AI, HireVue) · Plagiarism · Transparency.
Citing AI in APA: Tool-name (Generative AI tool). Date received. Topic. Prompt in brackets.
Hands-on tools: Canva (Text→Image) · Gemini (text) · Veed (text/music) · Animaker (video) · ChatGPT.
Gemini API chatbot pipeline: Get API key → pip install google-generativeai → import → configure(api_key) → GenerativeModel('gemini-pro') → start_chat → send_message.
Gemini models: gemini-pro (text-only) · gemini-pro-vision (text + images).
Future: ethical development · healthcare/education breakthroughs · multimodal creation · human-AI collaboration.
Generative AI is a form of Weak AI.

Practice Quiz — test yourself on this chapter→