VM-LEARNING /class.x ·track.ai ·ch-b4 session: 2026_27
$cd ..

~/Statistical Data

root@vm-learning ~ $ open ch-b4
PART B ▪ UNIT 4
09
Statistical Data
Data Science · No-Code AI · Orange Data Mining · Statistics
Data Science is the field that unifies statistics, data analysis, machine learning and related methods to understand and analyse real-world phenomena with data. It uses techniques from Mathematics, Statistics, Computer Science and Information Science. This unit introduces No-Code AI tools (Orange Data Mining, MS Excel) to work with statistical data — so you can build AI models without writing code.
This unit is practical-based. It has two sub-units: (4.1) Introduction to Data Science + No-Code AI (Orange Data Mining, Lobe, Teachable Machine, etc.). (4.2) Statistical Data Use-Case Walkthrough — important statistical concepts + MS Excel + full Orange workflow with Palmer Penguins case study.

Introduction — Data Science & Its Applications

AI depends entirely on data. Based on the type of data fed to a machine, AI splits into three broad domains — Data Science (statistical data), Computer Vision, and NLP. Data Science is the foundation that makes machines understand numbers, patterns and trends.

🔹 Real-World Applications of Data Science
🔎 Internet SearchGoogle, Bing, Yahoo — all use data-science algorithms to deliver the best result in a fraction of a second. Google processes more than 20 petabytes of data daily.
📣 Targeted AdvertisingDisplay banners, digital billboards at airports — almost all digital ads are decided using data-science algorithms. This is why digital ads have higher Click-Through Rates (CTR) than traditional ones.
🛒 Website RecommendationsAmazon, Twitter, Google Play, Netflix, LinkedIn, IMDB — all recommend products or content based on past search results.
🧬 Genetics & GenomicsData science integrates different kinds of genomic data to understand reactions to drugs and diseases — enabling personalised treatment.
🏥 Healthcare AnalyticsPredicting patient outcomes, drug discovery, hospital resource planning — all driven by statistical AI.
💳 Fraud DetectionBanks analyse transaction patterns in milliseconds to flag fraudulent purchases.
📈 Stock MarketPredicting prices, trends, volatility through statistical modelling.
🌤️ Weather ForecastingIMD and weather apps use historical + real-time data to forecast rain, storms and temperature.
Learning Outcome 1: Define No-Code / Low-Code AI and identify differences

4.1 No-Code AI Tool — Introduction

Imagine: You want to build a food-delivery application. How do you start? Three approaches exist — High-Code, Low-Code, No-Code.

🧑‍💻 High-Code · Low-Code · No-Code Comparison

High-CodeLow-CodeNo-Code
What it isTraditional development — coders write all code manually using Java, Python, C#.Platforms with visual interfaces + pre-built components. Some manual code still needed.Create applications without any coding or scripting. Drag-and-drop only.
Coding knowledgeMandatory — deep expertise needed.Partial — developers write some code.Not required — anyone can build.
CostExpensive.Less expensive than high-code.Least expensive.
CustomisationFull — you own the product, can make anything.Limited customisation.Lacks customisable options — limited to tool's built-in functions.
Ease of useComplex — needs coders.Moderate.Simple — drag & drop.
ExampleCustom chatbot built from scratch.Mendix, OutSystems, Microsoft Power Apps.Orange Data Mining, Teachable Machine, Lobe.
Custom code is also known as high-code. The company owns everything they build.

4.2 Why Do We Need No-Code AI?

🐞 No Code ErrorsWe tend to run into many types of errors when coding — troublesome at times. No code = no code errors!
💰 Saves CostFully coded AI systems are costly to build. No-Code helps businesses cut expenses.
👥 No AI Hiring NeededCompanies can implement AI without hiring specialised AI staff — less stress.
🎓 Easy to UseEven middle-school students can create AI using No-Code tools.
👁️ Visual / Drag & DropYou see what you're building in real time — intuitive interface.
⚡ Fast PrototypingBuild and test AI ideas in minutes, not weeks.

4.3 Who Can Use No-Code AI?

No-Code AI makes AI accessible to the general public. Non-technical people — doctors, architects, musicians, teachers — can quickly construct accurate AI models without any coding.

Scenario — Kayla the Zoo Dietitian:
Kayla manages the food budget at a zoo. With rising prices of meat and vegetables, the zoo wants to predict future food prices to raise sponsorship. Kayla has never coded, but using a No-Code tool like Orange Data Mining, she builds a price-prediction model herself!

4.4 Benefits of No-Code Tools

  • Accessibility — anyone, even non-tech people, can use AI.
  • Cost-effective — much cheaper than writing code.
  • Time-saving — drag-and-drop builds models in minutes.
  • Visual learning — easy to understand workflows.
  • Real-time feedback — see results as you build.
  • Low risk of errors — tool handles coding automatically.
  • Easy iteration — change parameters, re-run in seconds.

4.5 Disadvantages of No-Code Tools

🔗 Lack of FlexibilityDrag-and-drop is convenient but you're limited to fixed elements. Customisation is restricted.
🤖 Automation BiasHumans tend to favour suggestions from automated systems and ignore contradictory information — even when the automation is wrong.
🔒 Security IssuesNo-code platforms don't always enforce security best practices — not ideal for sensitive data.

4.6 Popular No-Code AI Tools

ToolReleasedDetails
Azure Machine LearningJuly 2014Cloud-based service by Microsoft. Build ML models without coding, clean data, train and evaluate models, put into production.
Google Cloud AutoMLJanuary 2018Users with limited ML knowledge can train high-quality custom models specific to business needs. Build in minutes and use in apps and websites.
Orange Data MiningOctober 1996Open-source data-visualisation + ML + data-mining toolkit by University of Ljubljana. Perform data analysis through drag-and-drop widgets.
Lobe AI2015Machine-learning platform to create custom ML models using a visual interface. Train models with a free, easy-to-use tool that auto-trains a custom model shippable in an app.
Teachable MachineNovember 2017Web-based tool by Google. Train a computer to recognise your own images, sounds and poses. No expertise or coding required.
Data Robot2012Automated ML platform for enterprise-grade models.

4.7 Building a Simple Price Prediction Model — Orange Data Mining

Let's help Kayla build a food-price prediction model in Orange Data Mining.

🔄 7-STEP WORKFLOW IN ORANGE
1. Download Dataset 2. Open Orange 3. Upload Dataset (File widget) 4. View Dataset (Data Table)
5. Select Model (Linear Regression) 6. Evaluate (Test & Score) 7. Prediction
StepAction
Step 1Download the dataset from fao.org/worldfoodsituation/foodpricesindex/
Step 2Double-click the Orange icon to open the tool.
Step 3Click the File widget under Data Menu → appears on canvas. Click to browse and upload the dataset. Select Food Price Index as the target variable.
Step 4Click the Data Table widget under Data Menu. Connect File → Data Table. Click to view the dataset.
Step 5Click Linear Regression widget under Model Menu. Connect File → Linear Regression.
Step 6Click Test & Score widget under Evaluate Menu. Connect File + Linear Regression → Test & Score. View performance parameters.
Step 7Click Prediction widget under Evaluate Menu. Connect Test & Score → Prediction. Click to view predicted prices.
Kayla now has a model that can predict future food prices — without writing a single line of code! She can now make a systematic fund-raising plan for the zoo.

4.8 Other No-Code Tools — Lobe & Teachable Machine

🎨 Lobe
  • Makes ML easy with everything needed to turn ideas into models.
  • Train models with a free, easy-to-use tool.
  • Auto-trains custom ML model shippable in your app.
🎯 Teachable Machine
  • Web-based tool by Google.
  • Train a computer to recognise images, sounds, poses.
  • Fast, easy, accessible to everyone — no expertise or coding required.
Learning Outcome 2: Statistical concepts · MS Excel · Orange Data Mining Use Case

4.9 Important Concepts in Statistics

📊 1. Statistical Sampling

🎯 2. Descriptive Statistics — Mean, Median, Mode

Describe the data and help understand its underlying characteristics.

📏 MeanThe central / average value. Sum of all values ÷ number of values.
🎯 MedianThe middle value when data is ordered from low to high and divided exactly in half.
🔢 ModeThe value which occurs most often in the dataset.
For data 2, 3, 3, 5, 7, 8, 9 →
• Mean = (2+3+3+5+7+8+9) / 7 = 5.28
• Median = 5 (middle value)
• Mode = 3 (appears twice)

📈 3. Distributions

🎲 4. Probability

📉 5. Variance, Standard Deviation, Outlier

4.10 MS Excel for Statistical Analysis

MS Excel is the simplest statistical tool. With the Analysis ToolPak add-in, Excel can perform regression, histograms, descriptive statistics, and much more.

Activity — Speed vs Distance Linear Regression in Excel:
  1. Step 1 — Get the Add-in: File → Options → Add-ins → Analysis ToolPak → Go → Check → OK. The Data Analysis option appears in the Data menu.
  2. Step 2 — View Data: Identify independent (X = Speed) and dependent (Y = Distance) features.
  3. Step 3 — Visualise: Select both columns → Insert → Charts → Scatter. Add chart title "Distance vs Speed".
  4. Step 4 — Add Regression Line: Click scatter plot → Chart Design → Add Chart Element → Trendline → More Trendline Options → Linear → tick "Display Equation on chart" and "Display R-squared value on chart".
  5. Step 5 — Verify Coefficients: Data → Data Analysis → Regression → Y Range (Distance column with label) → X Range (Speed column with label) → tick Labels → choose output cell → OK. Summary stats appear.
  6. Step 6 — Predict: Use the generated equation y = mx + c. For Speed = 6, calculate Distance.

4.11 Orange Data Mining — What is It?

Orange Data Mining (ODM) is an open-source data-mining and machine-learning software suite designed for data analysis, visualisation and exploration. It has a graphical user interface (GUI) that lets users interactively build data-analysis workflows using components called widgets.
🔹 Key Features of Orange

4.12 Orange Widgets — By Category

📥 1. Data Loading Widgets

Bring your data into Orange from files or online sources:

🔍 2. Data Exploration Widgets

Look at data in different ways to spot patterns:

🧼 3. Preprocessing Widgets

Clean up data before modelling:

🎯 4. Feature Selection Widgets

🤖 5. Modelling Widgets

Build models from your data:

📊 6. Evaluation Widgets

Check model performance:

📈 7. Visualization Widgets

Turn data into visual representations:

4.13 AI Project Cycle Mapped to Orange Data Mining

AI Project Cycle StageOrange Widget / Action
1. Problem ScopingDefine the problem statement (done outside Orange — what are you predicting?).
2. Data AcquisitionFile / URL widget to load the dataset.
3. Data ExplorationData Table, Scatter Plot, Distributions widgets to explore.
4. ModelingLinear Regression, Classification Tree, k-Means, SVM, Logistic Regression widgets.
5. EvaluationTest & Score, Cross Validation, ROC Curve widgets.
6. DeploymentPrediction widget; then integrate into an application.

4.14 Case Study — Palmer Penguins

About the dataset: Palmer Penguins are a species found in the Antarctic Peninsula region. Researchers study their behaviour, habitat, population dynamics and effects of climate change. The Palmer Penguins dataset is a popular alternative to the famous Iris dataset. Available on Kaggle.

🎯 Stage 1 — Problem Scoping

The researchers want to predict the species of Palmer Penguins based on collected data. The dataset has three species — Adelie, Chinstrap, Gentoo — and physical features differ across them.

📥 Stage 2 — Data Acquisition

Features in the dataset include:

🔍 Stage 3 — Data Exploration

🤖 Stage 4 — Modelling

📊 Stage 5 — Evaluation

🚀 Stage 6 — Prediction / Deployment

4.15 Limitations of No-Code AI Tools

Quick Revision — Key Points to Remember

  • Data Science = unifies statistics + data analysis + ML + their related methods.
  • Applications: internet search (Google processes 20 PB/day), targeted ads, recommendations (Amazon/Netflix), genetics & genomics, healthcare, fraud detection, stock market, weather.
  • 3 coding approaches: High-Code (manual coding) · Low-Code (visual + some code) · No-Code (drag-drop only, no coding).
  • Why No-Code AI: no code errors, saves cost, no AI hiring, easy to use, visual, fast prototyping.
  • Who uses No-Code: non-technical people — doctors, architects, musicians, teachers.
  • Benefits: accessible, cost-effective, time-saving, visual, low-risk, easy iteration.
  • Disadvantages: lack of flexibility · automation bias · security issues.
  • Popular No-Code Tools: Azure ML (2014) · Google Cloud AutoML (2018) · Orange Data Mining (1996) · Lobe AI (2015) · Teachable Machine (2017) · Data Robot.
  • Kayla's Zoo Example: food-price prediction built in Orange without coding using 7 steps.
  • Statistics concepts: Population & Sample · Mean (average) · Median (middle) · Mode (most frequent) · Distribution (frequency chart) · Normal distribution · Probability · Variance · Standard Deviation · Outlier.
  • MS Excel: Analysis ToolPak add-in → scatter plot → trendline → regression equation (y = mx + c).
  • Orange Data Mining = open-source drag-and-drop data-mining tool by Uni of Ljubljana.
  • Orange widget categories: Data Loading (File/URL/Data Table) · Exploration (Scatter Plot, Distributions) · Preprocessing (Impute, Normalize) · Feature Selection · Modelling (Classification Tree, k-Means, SVM, Linear/Logistic Regression) · Evaluation (Test & Score, Cross Validation, ROC) · Visualization (Bar Chart, Heat Map).
  • AI Project Cycle in Orange: Problem Scoping → File widget → Data Table/Scatter → Model widget → Test & Score → Prediction.
  • Palmer Penguins Case Study: classify species (Adelie, Chinstrap, Gentoo) using Bill Length, Bill Depth, Flipper Length, Body Mass, Island, Sex.
  • No-Code limitations: limited customisation, scalability issues, security concerns, automation bias.
🧠Practice Quiz — test yourself on this chapter