VM-LEARNING /class.x ·track.ai ·ch-b4 session: 2026_27
$cd ..

~/Statistical Data

root@vm-learning ~ $ open ch-b4
PART B ▪ UNIT 4
09
Statistical Data
Data Science · No-Code AI · Orange Data Mining · Statistics
Data Science is the field that unifies statistics, data analysis, machine learning and related methods to understand and analyse real-world phenomena with data. It uses techniques from Mathematics, Statistics, Computer Science and Information Science. This unit introduces No-Code AI tools (Orange Data Mining, MS Excel) to work with statistical data — so you can build AI models without writing code.
This unit is practical-based. It has two sub-units: (4.1) Introduction to Data Science + No-Code AI (Orange Data Mining, Lobe, Teachable Machine, etc.). (4.2) Statistical Data Use-Case Walkthrough — important statistical concepts + MS Excel + full Orange workflow with Palmer Penguins case study.

Introduction — Data Science & Its Applications

AI depends entirely on data. Based on the type of data fed to a machine, AI splits into three broad domains — Data Science (statistical data), Computer Vision, and NLP. Data Science is the foundation that makes machines understand numbers, patterns and trends.

Real-World Applications of Data Science
Internet SearchGoogle, Bing, Yahoo — all use data-science algorithms to deliver the best result in a fraction of a second. Google processes more than 20 petabytes of data daily.
Targeted AdvertisingDisplay banners, digital billboards at airports — almost all digital ads are decided using data-science algorithms. This is why digital ads have higher Click-Through Rates (CTR) than traditional ones.
Website RecommendationsAmazon, Twitter, Google Play, Netflix, LinkedIn, IMDB — all recommend products or content based on past search results.
Genetics & GenomicsData science integrates different kinds of genomic data to understand reactions to drugs and diseases — enabling personalised treatment.
Healthcare AnalyticsPredicting patient outcomes, drug discovery, hospital resource planning — all driven by statistical AI.
Fraud DetectionBanks analyse transaction patterns in milliseconds to flag fraudulent purchases.
Stock MarketPredicting prices, trends, volatility through statistical modelling.
Weather ForecastingIMD and weather apps use historical + real-time data to forecast rain, storms and temperature.
Learning Outcome 1: Define No-Code / Low-Code AI and identify differences

4.1 No-Code AI Tool — Introduction

Imagine: You want to build a food-delivery application. How do you start? Three approaches exist — High-Code, Low-Code, No-Code.

High-Code · Low-Code · No-Code Comparison

High-CodeLow-CodeNo-Code
What it isTraditional development — coders write all code manually using Java, Python, C#.Platforms with visual interfaces + pre-built components. Some manual code still needed.Create applications without any coding or scripting. Drag-and-drop only.
Coding knowledgeMandatory — deep expertise needed.Partial — developers write some code.Not required — anyone can build.
CostExpensive.Less expensive than high-code.Least expensive.
CustomisationFull — you own the product, can make anything.Limited customisation.Lacks customisable options — limited to tool's built-in functions.
Ease of useComplex — needs coders.Moderate.Simple — drag & drop.
ExampleCustom chatbot built from scratch.Mendix, OutSystems, Microsoft Power Apps.Orange Data Mining, Teachable Machine, Lobe.
Custom code is also known as high-code. The company owns everything they build.

4.2 Why Do We Need No-Code AI?

No Code ErrorsWe tend to run into many types of errors when coding — troublesome at times. No code = no code errors!
Saves CostFully coded AI systems are costly to build. No-Code helps businesses cut expenses.
No AI Hiring NeededCompanies can implement AI without hiring specialised AI staff — less stress.
Easy to UseEven middle-school students can create AI using No-Code tools.
Visual / Drag & DropYou see what you're building in real time — intuitive interface.
Fast PrototypingBuild and test AI ideas in minutes, not weeks.

4.3 Who Can Use No-Code AI?

No-Code AI makes AI accessible to the general public. Non-technical people — doctors, architects, musicians, teachers — can quickly construct accurate AI models without any coding.

Scenario — Kayla the Zoo Dietitian:
Kayla manages the food budget at a zoo. With rising prices of meat and vegetables, the zoo wants to predict future food prices to raise sponsorship. Kayla has never coded, but using a No-Code tool like Orange Data Mining, she builds a price-prediction model herself!

4.4 Benefits of No-Code Tools

  • Accessibility — anyone, even non-tech people, can use AI.
  • Cost-effective — much cheaper than writing code.
  • Time-saving — drag-and-drop builds models in minutes.
  • Visual learning — easy to understand workflows.
  • Real-time feedback — see results as you build.
  • Low risk of errors — tool handles coding automatically.
  • Easy iteration — change parameters, re-run in seconds.

4.5 Disadvantages of No-Code Tools

Lack of FlexibilityDrag-and-drop is convenient but you're limited to fixed elements. Customisation is restricted.
Automation BiasHumans tend to favour suggestions from automated systems and ignore contradictory information — even when the automation is wrong.
Security IssuesNo-code platforms don't always enforce security best practices — not ideal for sensitive data.

4.6 Popular No-Code AI Tools

ToolReleasedDetails
Azure Machine LearningJuly 2014Cloud-based service by Microsoft. Build ML models without coding, clean data, train and evaluate models, put into production.
Google Cloud AutoMLJanuary 2018Users with limited ML knowledge can train high-quality custom models specific to business needs. Build in minutes and use in apps and websites.
Orange Data MiningOctober 1996Open-source data-visualisation + ML + data-mining toolkit by University of Ljubljana. Perform data analysis through drag-and-drop widgets.
Lobe AI2015Machine-learning platform to create custom ML models using a visual interface. Train models with a free, easy-to-use tool that auto-trains a custom model shippable in an app.
Teachable MachineNovember 2017Web-based tool by Google. Train a computer to recognise your own images, sounds and poses. No expertise or coding required.
Data Robot2012Automated ML platform for enterprise-grade models.

4.7 Building a Simple Price Prediction Model — Orange Data Mining

Let's help Kayla build a food-price prediction model in Orange Data Mining.

7-STEP WORKFLOW IN ORANGE
1. Download Dataset 2. Open Orange 3. Upload Dataset (File widget) 4. View Dataset (Data Table)
5. Select Model (Linear Regression) 6. Evaluate (Test & Score) 7. Prediction
StepAction
Step 1Download the dataset from fao.org/worldfoodsituation/foodpricesindex/
Step 2Double-click the Orange icon to open the tool.
Step 3Click the File widget under Data Menu → appears on canvas. Click to browse and upload the dataset. Select Food Price Index as the target variable.
Step 4Click the Data Table widget under Data Menu. Connect File → Data Table. Click to view the dataset.
Step 5Click Linear Regression widget under Model Menu. Connect File → Linear Regression.
Step 6Click Test & Score widget under Evaluate Menu. Connect File + Linear Regression → Test & Score. View performance parameters.
Step 7Click Prediction widget under Evaluate Menu. Connect Test & Score → Prediction. Click to view predicted prices.
Kayla now has a model that can predict future food prices — without writing a single line of code! She can now make a systematic fund-raising plan for the zoo.

4.8 Other No-Code Tools — Lobe & Teachable Machine

Lobe
  • Makes ML easy with everything needed to turn ideas into models.
  • Train models with a free, easy-to-use tool.
  • Auto-trains custom ML model shippable in your app.
Teachable Machine
  • Web-based tool by Google.
  • Train a computer to recognise images, sounds, poses.
  • Fast, easy, accessible to everyone — no expertise or coding required.
Learning Outcome 2: Statistical concepts · MS Excel · Orange Data Mining Use Case

4.9 Important Concepts in Statistics

1. Statistical Sampling

2. Descriptive Statistics — Mean, Median, Mode

Describe the data and help understand its underlying characteristics.

MeanThe central / average value. Sum of all values ÷ number of values.
MedianThe middle value when data is ordered from low to high and divided exactly in half.
ModeThe value which occurs most often in the dataset.
For data 2, 3, 3, 5, 7, 8, 9 →
• Mean = (2+3+3+5+7+8+9) / 7 = 5.28
• Median = 5 (middle value)
• Mode = 3 (appears twice)

3. Distributions

4. Probability

5. Variance, Standard Deviation, Outlier

4.10 MS Excel for Statistical Analysis

MS Excel is the simplest statistical tool. With the Analysis ToolPak add-in, Excel can perform regression, histograms, descriptive statistics, and much more.

Activity — Speed vs Distance Linear Regression in Excel:
  1. Step 1 — Get the Add-in: File → Options → Add-ins → Analysis ToolPak → Go → Check → OK. The Data Analysis option appears in the Data menu.
  2. Step 2 — View Data: Identify independent (X = Speed) and dependent (Y = Distance) features.
  3. Step 3 — Visualise: Select both columns → Insert → Charts → Scatter. Add chart title "Distance vs Speed".
  4. Step 4 — Add Regression Line: Click scatter plot → Chart Design → Add Chart Element → Trendline → More Trendline Options → Linear → tick "Display Equation on chart" and "Display R-squared value on chart".
  5. Step 5 — Verify Coefficients: Data → Data Analysis → Regression → Y Range (Distance column with label) → X Range (Speed column with label) → tick Labels → choose output cell → OK. Summary stats appear.
  6. Step 6 — Predict: Use the generated equation y = mx + c. For Speed = 6, calculate Distance.

4.11 Orange Data Mining — What is It?

Orange Data Mining (ODM) is an open-source data-mining and machine-learning software suite designed for data analysis, visualisation and exploration. It has a graphical user interface (GUI) that lets users interactively build data-analysis workflows using components called widgets.
Key Features of Orange

4.12 Orange Widgets — By Category

1. Data Loading Widgets

Bring your data into Orange from files or online sources:

2. Data Exploration Widgets

Look at data in different ways to spot patterns:

3. Preprocessing Widgets

Clean up data before modelling:

4. Feature Selection Widgets

5. Modelling Widgets

Build models from your data:

6. Evaluation Widgets

Check model performance:

7. Visualization Widgets

Turn data into visual representations:

4.13 AI Project Cycle Mapped to Orange Data Mining

AI Project Cycle StageOrange Widget / Action
1. Problem ScopingDefine the problem statement (done outside Orange — what are you predicting?).
2. Data AcquisitionFile / URL widget to load the dataset.
3. Data ExplorationData Table, Scatter Plot, Distributions widgets to explore.
4. ModelingLinear Regression, Classification Tree, k-Means, SVM, Logistic Regression widgets.
5. EvaluationTest & Score, Cross Validation, ROC Curve widgets.
6. DeploymentPrediction widget; then integrate into an application.

4.14 Case Study — Palmer Penguins

About the dataset: Palmer Penguins are a species found in the Antarctic Peninsula region. Researchers study their behaviour, habitat, population dynamics and effects of climate change. The Palmer Penguins dataset is a popular alternative to the famous Iris dataset. Available on Kaggle.

Stage 1 — Problem Scoping

The researchers want to predict the species of Palmer Penguins based on collected data. The dataset has three species — Adelie, Chinstrap, Gentoo — and physical features differ across them.

Stage 2 — Data Acquisition

Features in the dataset include:

Stage 3 — Data Exploration

Stage 4 — Modelling

Stage 5 — Evaluation

Stage 6 — Prediction / Deployment

4.15 Limitations of No-Code AI Tools

Quick Revision — Key Points to Remember

  • Data Science = unifies statistics + data analysis + ML + their related methods.
  • Applications: internet search (Google processes 20 PB/day), targeted ads, recommendations (Amazon/Netflix), genetics & genomics, healthcare, fraud detection, stock market, weather.
  • 3 coding approaches: High-Code (manual coding) · Low-Code (visual + some code) · No-Code (drag-drop only, no coding).
  • Why No-Code AI: no code errors, saves cost, no AI hiring, easy to use, visual, fast prototyping.
  • Who uses No-Code: non-technical people — doctors, architects, musicians, teachers.
  • Benefits: accessible, cost-effective, time-saving, visual, low-risk, easy iteration.
  • Disadvantages: lack of flexibility · automation bias · security issues.
  • Popular No-Code Tools: Azure ML (2014) · Google Cloud AutoML (2018) · Orange Data Mining (1996) · Lobe AI (2015) · Teachable Machine (2017) · Data Robot.
  • Kayla's Zoo Example: food-price prediction built in Orange without coding using 7 steps.
  • Statistics concepts: Population & Sample · Mean (average) · Median (middle) · Mode (most frequent) · Distribution (frequency chart) · Normal distribution · Probability · Variance · Standard Deviation · Outlier.
  • MS Excel: Analysis ToolPak add-in → scatter plot → trendline → regression equation (y = mx + c).
  • Orange Data Mining = open-source drag-and-drop data-mining tool by Uni of Ljubljana.
  • Orange widget categories: Data Loading (File/URL/Data Table) · Exploration (Scatter Plot, Distributions) · Preprocessing (Impute, Normalize) · Feature Selection · Modelling (Classification Tree, k-Means, SVM, Linear/Logistic Regression) · Evaluation (Test & Score, Cross Validation, ROC) · Visualization (Bar Chart, Heat Map).
  • AI Project Cycle in Orange: Problem Scoping → File widget → Data Table/Scatter → Model widget → Test & Score → Prediction.
  • Palmer Penguins Case Study: classify species (Adelie, Chinstrap, Gentoo) using Bill Length, Bill Depth, Flipper Length, Body Mass, Island, Sex.
  • No-Code limitations: limited customisation, scalability issues, security concerns, automation bias.
Practice Quiz — test yourself on this chapter
©2026 VM Technologies · Vivek Maheshwari (MCA)