Introduction — What Is Generative AI?
We have all seen celebrities targeted with fake images and people claiming AI-generated content as their own. This unit explores a new dimension of AI — Generative AI — and the principles behind all these tools.
🔹 Key Concepts You'll Learn
- Introduction to Generative AI
- Working of Generative AI
- Generative and Discriminative models
- Applications of Generative AI
- LLM — Large Language Model
- Future of Generative AI
- Ethical and Social Implications of Generative AI
Prerequisites: AI concepts from Class XI + basic Python (installing/importing packages).
1.1 Working of Generative AI
Generative AI learns patterns from data and autonomously generates similar samples. It operates within deep learning, using neural networks to understand intricate patterns. Two key model families:
⚔️ 1. Generative Adversarial Networks (GANs)
A neural-network architecture with two networks that compete:
- Generator — creates new data samples (images, text) — the "fake".
- Discriminator — evaluates samples to distinguish real from fake.
Through adversarial training, GANs learn to produce samples indistinguishable from real data. Applied in image generation, style transfer, data augmentation.
🔄 2. Variational Autoencoders (VAEs)
Unique programs with two parts:
- Encoder — converts data into a hidden latent space (compressed representation).
- Decoder — translates information back from the latent space into its original form.
VAEs focus on capturing underlying patterns. Applications: data generation · anomaly detection · filling in missing information.
🔹 GANs vs VAEs — Which to Choose?
- GANs — excellent for visually realistic outputs.
- VAEs — better for structured data generation and tasks requiring interpretable latent spaces.
2.1 Applications of Generative AI — 4 Content Types
🖼️ 1. Image Generation
Computers produce new pictures resembling ones they have seen. Analyse characteristics of input images and generate new ones with similar features.
✍️ 2. Text Generation
Computers write sentences that sound human-written. Analyse text to produce coherent, contextually relevant text.
🎬 3. Video Generation
Create new videos by learning from existing ones — animations, visual effects, realistic visuals.
🎵 4. Audio Generation
Generate fresh audio content — music, sound effects, speech — inspired by existing recordings.
3.1 Two Broad Model Categories
- Discriminative models — define class boundaries in data; suited for classification. Learn features that differentiate classes (e.g., spam vs not-spam emails).
- Generative models — comprehend the underlying data distribution and generate new samples.
3.2 Generative AI vs Discriminative AI — Comparison
| Aspect | Generative AI | Discriminative AI |
|---|---|---|
| Purpose | Creates new things (images, stories); finds unusual things; learns without precise guidance. | Determines what something is by looking at features; good at telling things apart. |
| Models | Networks that compete (GAN) or guess from patterns to create new things. | Finds rules to separate things — e.g., dog vs cat. |
| Training Focus | Understands what makes data unique; creates new data similar-but-different. | Learns to draw lines / rules that tell things apart by features. |
| Applications | New artworks · new story ideas · anomaly detection in data. | Facial recognition · speech recognition · spam classification. |
| Example Algorithms | Naïve Bayes · Gaussian discriminant analysis · GAN · VAE · LLM · DBM · Autoregressive models | Logistic Regression · Decision Trees · SVM · Random Forest |
4.1 Ethical and Social Implications of Generative AI
🎭 1. Deepfake Technology
Tools like DeepFaceLab and FaceSwap raise concerns about the authenticity of digital content. Deepfake algorithms can generate compelling fake images/audio/videos — jeopardising trust in media and spreading misinformation.
⚖️ 2. Bias and Discrimination
Generative AI models (e.g., Clearview AI's facial-recognition algorithms) have shown biases that disproportionately affect certain demographic groups — perpetuating inequalities and stereotypes.
📝 3. Plagiarism
Presenting AI-generated content as one's own (intentionally or not) raises ethical questions about intellectual property and academic integrity. If AI output significantly resembles copyrighted material, it could infringe copyright law.
🔍 4. Transparency
Disclosing the use of AI-generated content is essential — especially in academic and professional settings — to uphold ethical standards and prevent academic dishonesty. Failing to disclose AI use erodes trust and credibility.
4.2 Points to Remember + Citing AI Sources
- Be cautious and transparent when using Generative AI.
- Respect copyright — do not present AI output as your own.
- Consult your teacher/institution for specific guidelines.
🔹 Citation Rules
- Intellectual Property — attribute AI-generated content properly; comply with copyright laws.
- Accuracy — verify AI-generated info; cite primary sources wherever possible.
- Ethical Use — acknowledge AI tools; provide context for generated content.
🔹 Citation Example (APA Style)
- Treat the AI as author — cite the tool name (e.g., Bard) + "Generative AI tool" in the author spot.
- Date it right — use the date you received the AI-generated content.
- Show your prompt — briefly mention the prompt given (optional).
Bard (Generative AI tool). (2024, February 20). How to cite generative AI in APA style.
5.1 Hands-on Exploration — 5 Generative AI Tools
🎨 Activity 1 · Canva for Education (Text → Image)
- Visit Canva for Education, sign up with your school email.
- Verify the email, log in.
- Open the Text to Image tool in the design-tools menu.
- Type a prompt — e.g., "A red balloon floating in a clear blue sky."
- Click Generate Image. Canva AI produces multiple options.
- Customise (colours, shapes, backgrounds) and download, or use inside a Canva design.
💬 Activity 2 · Google Gemini (Text Generation)
- Sign in to your Google account.
- Navigate to the Google Gemini platform.
- Agree to the terms of service + privacy policy.
- Craft a descriptive, specific prompt.
- Click the text-generation option, wait for Gemini to process and review output.
🎵 Activity 3 · Veed AI (Text & Music)
- Visit the Veed website → sign up.
- Open the AI Text Generator.
- Enter a concise prompt — e.g., "Write a short story about a lost astronaut searching for their way home."
- Click Generate Text and review.
🎬 Activity 4 · Animaker (Video Generation)
- Visit animaker.com → sign up.
- Open the AI Video Generation tool.
- Enter a detailed prompt — e.g., "Create a promotional video for a new product launch with animated characters and dynamic visuals."
- Click Generate Video and customise.
🤖 Activity 5 · ChatGPT
Visit chat.openai.com → log in → start prompting for text, code, explanations or creative writing.
6.1 Large Language Model (LLM) — Definition
LLMs are called "large" because they are trained on massive datasets of text and code — often trillions of words. Dataset quality directly affects model performance.
🔹 Transformers in LLMs
Transformers — a neural-network architecture that revolutionised NLP and is the engine behind modern LLMs. They enable efficient learning of complex language patterns and long-range relationships in text data.
6.2 Leading LLMs in 2024
- OpenAI's GPT-4o — multi-modal, excels at text + images.
- Google's Gemini 1.5 Pro — seamless multi-modal text/image/speech.
- Meta's LLaMA 3.1 — open-source, high efficiency on diverse AI tasks.
- Anthropic's Claude 3.5 — prioritises safety and interpretability.
- Mistral AI's Mixtral 8 × 7B — sparse mixture of experts for superior performance with smaller sizes.
6.3 Applications of LLMs
- Text Generation — content creation, dialogue, story/poetry generation, translating natural language → working code, auto-completion for writing & email.
- Audio Generation — LLMs feed Text-to-Speech (TTS) systems to synthesise natural-sounding speech.
- Image Generation — LLMs do image captioning (generate textual descriptions for images); enhances accessibility.
- Video Generation — generate subtitles, captions, scene summaries; improves searchability.
6.4 Limitations of LLMs
- Processing text needs significant compute → high response time and cost.
- LLMs prioritise natural language over accuracy → can generate factually incorrect or misleading information with high confidence.
- May memorise specifics rather than generalise → poor adaptability.
6.5 Risks of LLMs
- Trained on Internet text → may exhibit biases; data-privacy concerns when personal information is processed.
- Sensitive data in training can inadvertently reveal confidential information.
- Crafted inputs (prompt injection) can lead to harmful or illogical outputs.
6.6 Case Study — LLaMA (Meta AI)
- Publicly trained on text & code scraped from the internet — fosters transparency and wider accessibility.
- Efficient training techniques require less compute — better scalability.
- Multi-model design — sizes from 7 billion to 65 billion parameters; pick what suits your task + hardware.
- Smaller models for everyday tasks; larger models for complex NLP.
- Delivers impressive results on text summarisation & QA — competitive with some much larger proprietary LLMs.
- Applications: personalised learning, content creation, research assistance.
6.7 Practical — Create a Chatbot with Gemini API
Large e-commerce companies like Amazon use AI chatbots for instant customer queries. Beyond customer service, custom chatbots power education (tutoring), research (data analysis), healthcare (appointment booking), and general-public tools.
🔑 Step 1 · Obtain an API Key
- Visit aistudio.google.com → click Get API Key.
- Click Create API Key.
- Copy the key and keep it safe.
🐍 Step 2 · Set Up Python Environment
Install the Gemini SDK (the google-generativeai package):
pip install -q -U google-generativeai
📦 Step 3 · Import Libraries
import google.generativeai as genai
🔐 Step 4 · Initialise Gemini with Your Key
GOOGLE_API_KEY = "YOUR_API_KEY_HERE" genai.configure(api_key=GOOGLE_API_KEY)
🧠 Step 5 · Choose a Gemini Model
- gemini-pro — optimised for text-only prompts.
- gemini-pro-vision — optimised for text-and-image prompts.
model = genai.GenerativeModel('gemini-pro')
💬 Step 6 · Create a Chat Session
chat = model.start_chat(history=[]) while True: user_input = input("You: ") if user_input.lower() in ['exit', 'quit']: break response = chat.send_message(user_input) print("Gemini:", response.text)
6.8 Future of Generative AI
- Evolving architectures to surpass current capabilities.
- Ethical development — minimise biases, ensure responsible use.
- Addressing complex challenges in healthcare, education, multilingual translation.
- Expansion in multimedia content creation.
- Deeper human-AI collaboration — AI as a supportive partner across domains.
- Primary objective of Generative AI → generate new data resembling its training samples.
- Common Gen-AI algorithms → GANs & VAEs.
- Purpose of Gen AI in text → create coherent, contextually relevant text from prompts.
- Gen-AI model famous for text → OpenAI's ChatGPT.
- Image generation analogy → computers make new pictures based on patterns they have learned.
- Not a Gen-AI application → analysing sentiment in social media (that's discriminative).
- Gen vs Disc training focus → Gen learns the data distribution; Disc identifies class boundaries.
- Tech raising authenticity concerns → Deepfake AI.
Quick Revision — Key Points to Remember
- Generative AI creates new content (text · image · audio · video) resembling its training samples.
- Famous tools: ChatGPT · Gemini · Claude · DALL-E.
- Two core architectures: GAN (Generator + Discriminator, adversarial) · VAE (Encoder + Decoder, latent space).
- GAN vs VAE: GAN = realistic visuals; VAE = structured data + interpretable latent space.
- 4 application domains: Image (DALL-E, Canva, Stable Diffusion) · Text (ChatGPT, Gemini, Perplexity) · Video (Lumiere, Deepfake) · Audio (Voicebox, Music LM).
- Generative vs Discriminative: Gen creates new data + learns distribution; Disc classifies + learns boundaries. Example Disc algos: Logistic Regression, Decision Trees, SVM, Random Forest. Example Gen algos: Naïve Bayes, GAN, VAE, LLM, DBM, Autoregressive.
- LLM = deep-learning model on trillions of words; core architecture = Transformer.
- Leading LLMs (2024): GPT-4o · Gemini 1.5 Pro · LLaMA 3.1 · Claude 3.5 · Mixtral 8×7B.
- LLM limitations: high compute · factual errors · memorisation over generalisation.
- LLM risks: bias · privacy leaks · prompt injection.
- LLaMA case study: publicly trained · efficient · sizes 7B–65B params.
- 4 Ethical concerns: Deepfake (DeepFaceLab, FaceSwap) · Bias (Clearview AI, HireVue) · Plagiarism · Transparency.
- Citing AI in APA: Tool-name (Generative AI tool). Date received. Topic. Prompt in brackets.
- Hands-on tools: Canva (Text→Image) · Gemini (text) · Veed (text/music) · Animaker (video) · ChatGPT.
- Gemini API chatbot pipeline: Get API key → pip install google-generativeai → import → configure(api_key) → GenerativeModel('gemini-pro') → start_chat → send_message.
- Gemini models: gemini-pro (text-only) · gemini-pro-vision (text + images).
- Future: ethical development · healthcare/education breakthroughs · multimodal creation · human-AI collaboration.
- Generative AI is a form of Weak AI.