Introduction — Why Orange?
- Students will learn to use Orange's intuitive interface across the domains of Data Science, Computer Vision and Natural Language Processing.
- Through hands-on projects & case studies, they gain practical insights into widget usage for data visualisation, preprocessing, feature selection, modelling and evaluation.
Prerequisites: Awareness of Data Science, NLP and Computer Vision concepts + basic knowledge of ML algorithms.
4.1 What Is Data Mining?
4.2 Introduction to Orange Data Mining Tool
Within Orange, components are widgets. Their functionalities range from basic data visualization, subset selection and preprocessing to empirical evaluation of learning algorithms and predictive modelling. Visual programming: workflows are built by interconnecting widgets on a canvas.
4.3 Beneficiaries of Orange Data Mining
| User Group | How Orange Helps |
|---|---|
| Data Analysts & Scientists | User-friendly interface — accessible even without extensive programming skills. |
| Researchers | Explore research data, test hypotheses, generate insights from experimental results. |
| Educators & Students | Intuitive interface + visual programming — introduce complex topics approachably. |
| Business Professionals | Identify trends, predict customer behaviour, optimise processes, improve performance. |
| Open-Source Community | Source code freely available; large community of contributors. |
1.1 Getting Started with Orange — Installation
🌐 1. Visit the Orange Website
Go to the official site: orangedatamining.com/download/.
💻 2. Choose the Correct Version
- Windows users: download the Standalone Installer.
- Mac users: select Orange for Apple Silicon.
⬇️ 3. Download the Installer
Click the respective download link to begin the download.
⚙️ 4. Install the Software
- Windows: double-click the installer and follow on-screen instructions.
- Mac: mount the disk image and drag the Orange application into the Applications folder.
🚀 5. Launch Orange
Start the tool from the system's Applications menu or its desktop icon.
1.2 Components of the Orange Data Mining Tool
- Blank Canvas — the workspace where you build analysis workflows by dragging and dropping widgets. Arrange and connect widgets to form a data-processing pipeline from input to output.
- Widgets — graphical elements that perform specific tasks or operations on data (e.g., File, Data Table, Scatter Plot).
- Connectors — lines that link widgets on the canvas, representing the flow of data from one widget's output to another's input.
1.3 Default Widget Catalog — 6 Categories
📥 1. Data Widgets
Used for data manipulation — load, store and read datasets.
- File — reads an input data file (table of instances) and sends it to its output channel.
- Data Table — displays attribute-value data in a spreadsheet view.
- SQL Table — reads data directly from an SQL database.
🔄 2. Transform Widgets
Apply various transformations to the dataset within the workflow — e.g., select columns, continuise, discretise, normalise, concatenate, impute, and preprocess.
📈 3. Visualize Widgets
Tools for visualising data — scatter plots, bar charts, heat maps, box plots, distributions, histograms, parallel coordinate plots.
🤖 4. Model Widgets
Apply machine-learning algorithms — classification, regression, clustering, anomaly detection — to build predictive models and analyse data patterns.
📊 5. Evaluate Widgets
Evaluate model performance via cross-validation, confusion matrix, ROC analysis, Test & Score, Predictions, etc.
🔍 6. Unsupervised Widgets
Facilitate exploratory data analysis & pattern recognition without labelled data — clustering, dimensionality reduction, association-rule mining.
2.1 Three Key Domains of AI with Orange
- Data Science with Orange
- Computer Vision with Orange
- Natural Language Processing with Orange
2.2 Data Science with Orange — The Iris Flower Case Study
The violet-coloured iris comes in three main types: Iris Setosa · Iris Versicolor · Iris Virginica. Key differences are in the sepal and petal length/width. We'll use Orange to measure and classify them.
🌸 A. Data Visualization — Exploring Iris Dimensions
- Launch Orange — opens a blank canvas.
- File widget — drag onto the canvas → double-click → select the iris dataset. Set the iris column role to target.
- Data Table widget — connect File → Data Table to view the 150 samples in tabular form.
- Scatter Plot widget — connect File → Scatter Plot; select variables like sepal length vs sepal width. Each point = one iris sample.
- Experiment with Histograms, Box Plots, Parallel Coordinate Plots for extra perspectives.
🌳 B. Classification with the Tree Widget
- Prepare testing data in a spreadsheet — columns for sepal length, sepal width, petal length, petal width (same names as training, cm units).
- Tree widget — drag it onto the canvas; connect the training File → Tree.
- Predictions widget — connect training File → Predictions, then connect a second File (testing data) → Predictions.
- Double-click Predictions — Orange displays the predicted class for each test sample.
Data Table is optional — it just displays data; not required to connect for classification.
🧪 C. Evaluating the Classification Model
- Test & Score widget — connect the File to it. Orange uses cross-validation with 10 subsets by default.
- Inspect Accuracy · Precision · Recall · F1 Score. In the Iris example, accuracy comes out to ≈ 93 %.
- Confusion Matrix widget — connect Test & Score → Confusion Matrix. Reveals TP / TN / FP / FN for each class, showing (e.g.) that Setosa is separated cleanly while Versicolor and Virginica overlap.
🔹 Cross-Validation (Rotation Estimation)
Cross-validation (a.k.a. rotation estimation) — resampling and sample-splitting method that uses different portions of the data to test and train a model across multiple iterations. Default in Orange = 10-fold CV.
2.3 Computer Vision with Orange — Dogs vs Cats Clustering
➕ Step 1 · Install Image Analytics Add-On
Go to Options → Add-ons → Image Analytics → install → restart Orange. The image widgets now appear in the side panel.
📂 Step 2 · Import Images
Drag the Import Images widget onto the canvas and upload the folder containing dog and cat images (CBSE provides a sample dataset via the Handbook's Google-Drive link).
🖼️ Step 3 · Image Viewer
Add the Image Viewer widget, connect Import Images → Image Viewer, and double-click to browse all the thumbnails.
🔢 Step 4 · Image Embedding
Connect the Image Embedding widget to Import Images. It sends each image to a server where a deep neural network trained on millions of real-life images converts it into a numerical vector (embedding).
📏 Step 5 · Distance (Cosine)
Connect the Distance widget to the output of Image Embedding. Double-click and select Cosine distance — usually the best-working option for images.
🌲 Step 6 · Hierarchical Clustering
Drag the Hierarchical Clustering widget and connect the Distance matrix to it. Double-click to see the dendrogram — a tree showing how images group by similarity.
👁️ Step 7 · Visualise Clusters
Use the Image Viewer to explore each cluster selected on the dendrogram — you'll notice dogs grouped together and cats in another cluster.
2.4 Natural Language Processing with Orange
➕ Step 1 · Install Text Add-On
Options → Add-ons → Text → install → restart Orange to activate the NLP widgets.
📚 Step 2 · Load or Create Textual Data
- Corpus widget — to load a prepared dataset (e.g., articles, reviews).
- Create Corpus widget — to type your own text directly into Orange.
🔎 Step 3 · Corpus Viewer
Connect Corpus Viewer to browse through the text, search for specific words (which it highlights) and preview documents.
☁️ Step 4 · Word Cloud — Visualise Word Frequencies
Connect the Word Cloud widget to the Corpus output. More frequent words are shown larger — a quick glimpse of prominent themes in the text.
🧹 Step 5 · Preprocess Text
Connect the Preprocess Text widget. It performs text normalisation:
- Convert text to lowercase.
- Tokenise into individual words.
- Remove punctuation.
- Filter out stop words (the, is, a, an…).
- Optionally apply stemming or lemmatisation to reduce words to their base form.
☁️ Step 6 · Visualise Cleaned Text
Connect Preprocess Text → Word Cloud again. Now only meaningful words appear — the main themes stand out clearly (e.g., in a story about a race, "Turtle" and "Rabbit" appear largest).
- Widget to see Accuracy, Precision, Recall, F1 Score → Test & Score.
- Widget giving detailed TP/TN/FP/FN breakdown → Confusion Matrix.
- Widget performing text normalisation (lowercase, tokenise, stop-word removal) → Preprocess Text.
- Cross-validation is also called → Rotation Estimation.
- Word Cloud — more frequent words appear larger → True.
- Widget to convert raw images into numerical vectors → Image Embedding.
- Widget to compare embeddings & compute similarities → Distance.
- Lines linking widgets on the canvas → Connectors.
- Open-source visual-programming tool for data viz + ML + mining → Orange.
- Add-on to cluster images of 2-legged vs 4-legged animals → Image Analytics.
Quick Revision — Key Points to Remember
- Data Mining = discovering trends, useful information and patterns from large datasets.
- Orange Data Mining = free, open-source, component-based visual-programming tool for data-viz, ML, data mining and analysis.
- 5 Beneficiaries: Data Analysts · Researchers · Educators/Students · Business Professionals · Open-Source Community.
- Install: orangedatamining.com/download — Windows Standalone installer / Mac Apple Silicon.
- 3 Components: Blank Canvas · Widgets · Connectors (arrows between widgets showing data flow).
- 6 Widget Categories: Data · Transform · Visualize · Model · Evaluate · Unsupervised.
- Key Data widgets: File · Data Table · SQL Table.
- 3 AI Domains with Orange: Data Science · Computer Vision · NLP.
- Iris Data Science workflow: File → Data Table → Scatter Plot → Tree → Predictions → Test & Score → Confusion Matrix.
- 3 Iris types: Setosa · Versicolor · Virginica (150-sample dataset). Accuracy ≈ 93%.
- Cross-Validation (a.k.a. Rotation Estimation) — default 10 folds in Orange.
- CV workflow (Dogs vs Cats): Image Analytics add-on → Import Images → Image Viewer → Image Embedding → Distance (cosine) → Hierarchical Clustering → Dendrogram.
- Embedding = numerical vector of an image computed by a deep NN trained on millions of images.
- NLP workflow: Text add-on → Corpus / Create Corpus → Corpus Viewer → Word Cloud → Preprocess Text → Word Cloud (cleaned).
- Preprocess Text steps: lowercase · tokenise · remove punctuation · remove stop words · (optional) stemming/lemmatisation.
- Word Cloud: more frequent words appear larger.