VM-LEARNING /class.xii ·track.ai ·ch-b4 session: 2026_27
$cd ..

~/AI with Orange Data Mining Tool

root@vm-learning ~ $ open ch-b4
PART B ▪ UNIT 4
04
AI with Orange Data Mining Tool
Widgets · Data Science · Computer Vision · NLP (Practical only)
Orange Data Mining — a free, open-source, component-based visual programming software for data visualization, machine learning, data mining, and data analysis. Components are called widgets and are connected on a canvas to build workflows — no coding required.

Introduction — Why Orange?

Prerequisites: Awareness of Data Science, NLP and Computer Vision concepts + basic knowledge of ML algorithms.

4.1 What Is Data Mining?

Data Mining — the process of discovering trends, useful information and patterns from large datasets. It analyses and interprets data to extract meaningful insights that aid decision-making.

4.2 Introduction to Orange Data Mining Tool

Within Orange, components are widgets. Their functionalities range from basic data visualization, subset selection and preprocessing to empirical evaluation of learning algorithms and predictive modelling. Visual programming: workflows are built by interconnecting widgets on a canvas.

4.3 Beneficiaries of Orange Data Mining

User GroupHow Orange Helps
Data Analysts & ScientistsUser-friendly interface — accessible even without extensive programming skills.
ResearchersExplore research data, test hypotheses, generate insights from experimental results.
Educators & StudentsIntuitive interface + visual programming — introduce complex topics approachably.
Business ProfessionalsIdentify trends, predict customer behaviour, optimise processes, improve performance.
Open-Source CommunitySource code freely available; large community of contributors.
Learning Outcome 1: Develop proficiency in utilizing the Orange Data Mining tool — navigate its interface, employ its features, and execute data-analysis tasks effectively

1.1 Getting Started with Orange — Installation

🌐 1. Visit the Orange Website

Go to the official site: orangedatamining.com/download/.

💻 2. Choose the Correct Version

⬇️ 3. Download the Installer

Click the respective download link to begin the download.

⚙️ 4. Install the Software

🚀 5. Launch Orange

Start the tool from the system's Applications menu or its desktop icon.

1.2 Components of the Orange Data Mining Tool

1.3 Default Widget Catalog — 6 Categories

📥 1. Data Widgets

Used for data manipulation — load, store and read datasets.

🔄 2. Transform Widgets

Apply various transformations to the dataset within the workflow — e.g., select columns, continuise, discretise, normalise, concatenate, impute, and preprocess.

📈 3. Visualize Widgets

Tools for visualising data — scatter plots, bar charts, heat maps, box plots, distributions, histograms, parallel coordinate plots.

🤖 4. Model Widgets

Apply machine-learning algorithms — classification, regression, clustering, anomaly detection — to build predictive models and analyse data patterns.

📊 5. Evaluate Widgets

Evaluate model performance via cross-validation, confusion matrix, ROC analysis, Test & Score, Predictions, etc.

🔍 6. Unsupervised Widgets

Facilitate exploratory data analysis & pattern recognition without labelled data — clustering, dimensionality reduction, association-rule mining.

Learning Outcome 2: Demonstrate the ability to apply Orange in real-world scenarios across AI domains — Data Science · Computer Vision · Natural Language Processing — through hands-on projects and case studies

2.1 Three Key Domains of AI with Orange

  1. Data Science with Orange
  2. Computer Vision with Orange
  3. Natural Language Processing with Orange

2.2 Data Science with Orange — The Iris Flower Case Study

The violet-coloured iris comes in three main types: Iris Setosa · Iris Versicolor · Iris Virginica. Key differences are in the sepal and petal length/width. We'll use Orange to measure and classify them.

🌸 A. Data Visualization — Exploring Iris Dimensions

  1. Launch Orange — opens a blank canvas.
  2. File widget — drag onto the canvas → double-click → select the iris dataset. Set the iris column role to target.
  3. Data Table widget — connect File → Data Table to view the 150 samples in tabular form.
  4. Scatter Plot widget — connect File → Scatter Plot; select variables like sepal length vs sepal width. Each point = one iris sample.
  5. Experiment with Histograms, Box Plots, Parallel Coordinate Plots for extra perspectives.

🌳 B. Classification with the Tree Widget

  1. Prepare testing data in a spreadsheet — columns for sepal length, sepal width, petal length, petal width (same names as training, cm units).
  2. Tree widget — drag it onto the canvas; connect the training File → Tree.
  3. Predictions widget — connect training File → Predictions, then connect a second File (testing data) → Predictions.
  4. Double-click Predictions — Orange displays the predicted class for each test sample.

Data Table is optional — it just displays data; not required to connect for classification.

🧪 C. Evaluating the Classification Model

  1. Test & Score widget — connect the File to it. Orange uses cross-validation with 10 subsets by default.
  2. Inspect Accuracy · Precision · Recall · F1 Score. In the Iris example, accuracy comes out to ≈ 93 %.
  3. Confusion Matrix widget — connect Test & Score → Confusion Matrix. Reveals TP / TN / FP / FN for each class, showing (e.g.) that Setosa is separated cleanly while Versicolor and Virginica overlap.
🔹 Cross-Validation (Rotation Estimation)

Cross-validation (a.k.a. rotation estimation) — resampling and sample-splitting method that uses different portions of the data to test and train a model across multiple iterations. Default in Orange = 10-fold CV.

Practical — Differentiate fruits vs vegetables by nutrition: collect data on energy, water, protein, fat, carbs, fibre, sugars, calcium, iron, magnesium, phosphorus, potassium, sodium (e.g., from Kaggle). Split into train/test. Train several classifiers via Orange widgets. Evaluate with Accuracy, Precision, Recall, F1.

2.3 Computer Vision with Orange — Dogs vs Cats Clustering

➕ Step 1 · Install Image Analytics Add-On

Go to Options → Add-ons → Image Analytics → install → restart Orange. The image widgets now appear in the side panel.

📂 Step 2 · Import Images

Drag the Import Images widget onto the canvas and upload the folder containing dog and cat images (CBSE provides a sample dataset via the Handbook's Google-Drive link).

🖼️ Step 3 · Image Viewer

Add the Image Viewer widget, connect Import Images → Image Viewer, and double-click to browse all the thumbnails.

🔢 Step 4 · Image Embedding

Connect the Image Embedding widget to Import Images. It sends each image to a server where a deep neural network trained on millions of real-life images converts it into a numerical vector (embedding).

📏 Step 5 · Distance (Cosine)

Connect the Distance widget to the output of Image Embedding. Double-click and select Cosine distance — usually the best-working option for images.

🌲 Step 6 · Hierarchical Clustering

Drag the Hierarchical Clustering widget and connect the Distance matrix to it. Double-click to see the dendrogram — a tree showing how images group by similarity.

👁️ Step 7 · Visualise Clusters

Use the Image Viewer to explore each cluster selected on the dendrogram — you'll notice dogs grouped together and cats in another cluster.

Practical — Cluster images of birds vs animals: collect enough labelled (or unlabelled) images of several species, import into Orange, run Image Embedding → Distance → Hierarchical Clustering, interpret the resulting dendrogram for patterns and similarities.

2.4 Natural Language Processing with Orange

➕ Step 1 · Install Text Add-On

Options → Add-ons → Text → install → restart Orange to activate the NLP widgets.

📚 Step 2 · Load or Create Textual Data

🔎 Step 3 · Corpus Viewer

Connect Corpus Viewer to browse through the text, search for specific words (which it highlights) and preview documents.

☁️ Step 4 · Word Cloud — Visualise Word Frequencies

Connect the Word Cloud widget to the Corpus output. More frequent words are shown larger — a quick glimpse of prominent themes in the text.

🧹 Step 5 · Preprocess Text

Connect the Preprocess Text widget. It performs text normalisation:

  1. Convert text to lowercase.
  2. Tokenise into individual words.
  3. Remove punctuation.
  4. Filter out stop words (the, is, a, an…).
  5. Optionally apply stemming or lemmatisation to reduce words to their base form.

☁️ Step 6 · Visualise Cleaned Text

Connect Preprocess Text → Word Cloud again. Now only meaningful words appear — the main themes stand out clearly (e.g., in a story about a race, "Turtle" and "Rabbit" appear largest).

Practical — Build your own corpus: pick a story or article, create a corpus in Orange, apply text normalisation (lowercase, tokenise, remove stop words), generate a Word Cloud before and after applying stemming/lemmatisation, and compare the most-frequent words.
Check Your Progress — quick MCQ pointers:
  • Widget to see Accuracy, Precision, Recall, F1 Score → Test & Score.
  • Widget giving detailed TP/TN/FP/FN breakdown → Confusion Matrix.
  • Widget performing text normalisation (lowercase, tokenise, stop-word removal) → Preprocess Text.
  • Cross-validation is also called → Rotation Estimation.
  • Word Cloud — more frequent words appear largerTrue.
  • Widget to convert raw images into numerical vectors → Image Embedding.
  • Widget to compare embeddings & compute similarities → Distance.
  • Lines linking widgets on the canvas → Connectors.
  • Open-source visual-programming tool for data viz + ML + mining → Orange.
  • Add-on to cluster images of 2-legged vs 4-legged animals → Image Analytics.

Quick Revision — Key Points to Remember

  • Data Mining = discovering trends, useful information and patterns from large datasets.
  • Orange Data Mining = free, open-source, component-based visual-programming tool for data-viz, ML, data mining and analysis.
  • 5 Beneficiaries: Data Analysts · Researchers · Educators/Students · Business Professionals · Open-Source Community.
  • Install: orangedatamining.com/download — Windows Standalone installer / Mac Apple Silicon.
  • 3 Components: Blank Canvas · Widgets · Connectors (arrows between widgets showing data flow).
  • 6 Widget Categories: Data · Transform · Visualize · Model · Evaluate · Unsupervised.
  • Key Data widgets: File · Data Table · SQL Table.
  • 3 AI Domains with Orange: Data Science · Computer Vision · NLP.
  • Iris Data Science workflow: File → Data Table → Scatter Plot → Tree → Predictions → Test & Score → Confusion Matrix.
  • 3 Iris types: Setosa · Versicolor · Virginica (150-sample dataset). Accuracy ≈ 93%.
  • Cross-Validation (a.k.a. Rotation Estimation) — default 10 folds in Orange.
  • CV workflow (Dogs vs Cats): Image Analytics add-on → Import Images → Image Viewer → Image Embedding → Distance (cosine) → Hierarchical Clustering → Dendrogram.
  • Embedding = numerical vector of an image computed by a deep NN trained on millions of images.
  • NLP workflow: Text add-on → Corpus / Create Corpus → Corpus Viewer → Word Cloud → Preprocess Text → Word Cloud (cleaned).
  • Preprocess Text steps: lowercase · tokenise · remove punctuation · remove stop words · (optional) stemming/lemmatisation.
  • Word Cloud: more frequent words appear larger.
🧠Practice Quiz — test yourself on this chapter