Introduction — Data Science & Its Applications
AI depends entirely on data. Based on the type of data fed to a machine, AI splits into three broad domains — Data Science (statistical data), Computer Vision, and NLP. Data Science is the foundation that makes machines understand numbers, patterns and trends.
🔹 Real-World Applications of Data Science
4.1 No-Code AI Tool — Introduction
🧑💻 High-Code · Low-Code · No-Code Comparison
| High-Code | Low-Code | No-Code | |
|---|---|---|---|
| What it is | Traditional development — coders write all code manually using Java, Python, C#. | Platforms with visual interfaces + pre-built components. Some manual code still needed. | Create applications without any coding or scripting. Drag-and-drop only. |
| Coding knowledge | Mandatory — deep expertise needed. | Partial — developers write some code. | Not required — anyone can build. |
| Cost | Expensive. | Less expensive than high-code. | Least expensive. |
| Customisation | Full — you own the product, can make anything. | Limited customisation. | Lacks customisable options — limited to tool's built-in functions. |
| Ease of use | Complex — needs coders. | Moderate. | Simple — drag & drop. |
| Example | Custom chatbot built from scratch. | Mendix, OutSystems, Microsoft Power Apps. | Orange Data Mining, Teachable Machine, Lobe. |
4.2 Why Do We Need No-Code AI?
4.3 Who Can Use No-Code AI?
No-Code AI makes AI accessible to the general public. Non-technical people — doctors, architects, musicians, teachers — can quickly construct accurate AI models without any coding.
Kayla manages the food budget at a zoo. With rising prices of meat and vegetables, the zoo wants to predict future food prices to raise sponsorship. Kayla has never coded, but using a No-Code tool like Orange Data Mining, she builds a price-prediction model herself!
4.4 Benefits of No-Code Tools
- Accessibility — anyone, even non-tech people, can use AI.
- Cost-effective — much cheaper than writing code.
- Time-saving — drag-and-drop builds models in minutes.
- Visual learning — easy to understand workflows.
- Real-time feedback — see results as you build.
- Low risk of errors — tool handles coding automatically.
- Easy iteration — change parameters, re-run in seconds.
4.5 Disadvantages of No-Code Tools
4.6 Popular No-Code AI Tools
| Tool | Released | Details |
|---|---|---|
| Azure Machine Learning | July 2014 | Cloud-based service by Microsoft. Build ML models without coding, clean data, train and evaluate models, put into production. |
| Google Cloud AutoML | January 2018 | Users with limited ML knowledge can train high-quality custom models specific to business needs. Build in minutes and use in apps and websites. |
| Orange Data Mining | October 1996 | Open-source data-visualisation + ML + data-mining toolkit by University of Ljubljana. Perform data analysis through drag-and-drop widgets. |
| Lobe AI | 2015 | Machine-learning platform to create custom ML models using a visual interface. Train models with a free, easy-to-use tool that auto-trains a custom model shippable in an app. |
| Teachable Machine | November 2017 | Web-based tool by Google. Train a computer to recognise your own images, sounds and poses. No expertise or coding required. |
| Data Robot | 2012 | Automated ML platform for enterprise-grade models. |
4.7 Building a Simple Price Prediction Model — Orange Data Mining
Let's help Kayla build a food-price prediction model in Orange Data Mining.
| Step | Action |
|---|---|
| Step 1 | Download the dataset from fao.org/worldfoodsituation/foodpricesindex/ |
| Step 2 | Double-click the Orange icon to open the tool. |
| Step 3 | Click the File widget under Data Menu → appears on canvas. Click to browse and upload the dataset. Select Food Price Index as the target variable. |
| Step 4 | Click the Data Table widget under Data Menu. Connect File → Data Table. Click to view the dataset. |
| Step 5 | Click Linear Regression widget under Model Menu. Connect File → Linear Regression. |
| Step 6 | Click Test & Score widget under Evaluate Menu. Connect File + Linear Regression → Test & Score. View performance parameters. |
| Step 7 | Click Prediction widget under Evaluate Menu. Connect Test & Score → Prediction. Click to view predicted prices. |
4.8 Other No-Code Tools — Lobe & Teachable Machine
- Makes ML easy with everything needed to turn ideas into models.
- Train models with a free, easy-to-use tool.
- Auto-trains custom ML model shippable in your app.
- Web-based tool by Google.
- Train a computer to recognise images, sounds, poses.
- Fast, easy, accessible to everyone — no expertise or coding required.
4.9 Important Concepts in Statistics
📊 1. Statistical Sampling
- Population = the entire set of raw data available for a test or experiment.
- You cannot always measure patterns and trends across the entire population.
- Take a sample — a portion of the population — and perform computations on it.
🎯 2. Descriptive Statistics — Mean, Median, Mode
Describe the data and help understand its underlying characteristics.
• Mean = (2+3+3+5+7+8+9) / 7 = 5.28
• Median = 5 (middle value)
• Mode = 3 (appears twice)
📈 3. Distributions
- Distribution = charts/graphs that display the frequency of each value in a dataset.
- Some distributions contain numbers much larger than others → skewed distribution.
- Normal Distribution = symmetrical, bell-shaped, with most values around the central peak.
🎲 4. Probability
- Probability = likelihood of an event occurring.
- An event is the outcome of an experiment.
- Events can be independent (one doesn't affect the other) or dependent.
📉 5. Variance, Standard Deviation, Outlier
- Variance = how far each value in the dataset is from the mean. A measurement of the spread of numbers.
- Standard Deviation = a calculation giving a single value that represents how widely distributed the values are.
- Outlier = a data point that lies at an abnormal distance from other values.
4.10 MS Excel for Statistical Analysis
MS Excel is the simplest statistical tool. With the Analysis ToolPak add-in, Excel can perform regression, histograms, descriptive statistics, and much more.
- Step 1 — Get the Add-in: File → Options → Add-ins → Analysis ToolPak → Go → Check → OK. The Data Analysis option appears in the Data menu.
- Step 2 — View Data: Identify independent (X = Speed) and dependent (Y = Distance) features.
- Step 3 — Visualise: Select both columns → Insert → Charts → Scatter. Add chart title "Distance vs Speed".
- Step 4 — Add Regression Line: Click scatter plot → Chart Design → Add Chart Element → Trendline → More Trendline Options → Linear → tick "Display Equation on chart" and "Display R-squared value on chart".
- Step 5 — Verify Coefficients: Data → Data Analysis → Regression → Y Range (Distance column with label) → X Range (Speed column with label) → tick Labels → choose output cell → OK. Summary stats appear.
- Step 6 — Predict: Use the generated equation y = mx + c. For Speed = 6, calculate Distance.
4.11 Orange Data Mining — What is It?
🔹 Key Features of Orange
- A machine-learning tool for data analysis through Python + visual programming.
- Perform operations on data through simple drag-and-drop steps.
- Visualise data; perform data mining and machine learning.
- No code required — use without writing a single line.
- Relatively easy with beautiful visuals.
- Open-source and free.
4.12 Orange Widgets — By Category
📥 1. Data Loading Widgets
Bring your data into Orange from files or online sources:
- File — loads data from CSV, Excel, SQL.
- URL — loads data from a URL.
- Data Table — displays loaded data in tabular format.
🔍 2. Data Exploration Widgets
Look at data in different ways to spot patterns:
- Scatter Plot — visualises the relationship between two variables.
- Data Table — manual inspection and exploration.
- Distributions — histograms and statistical distributions of variables.
🧼 3. Preprocessing Widgets
Clean up data before modelling:
- Impute — handles missing values.
- Normalize — scales data to a common scale.
- Select Columns — pick specific columns.
🎯 4. Feature Selection Widgets
- Select Columns — choose relevant features.
- Select Best Features — auto-selects best features using criteria like mutual information or correlation.
🤖 5. Modelling Widgets
Build models from your data:
- Classification Tree — decision-tree classifier.
- k-Means — clustering algorithm.
- Support Vector Machine (SVM) — classifier.
- Logistic Regression — classification model.
- Linear Regression — regression model.
📊 6. Evaluation Widgets
Check model performance:
- Test & Score — evaluates model on a test dataset.
- Cross Validation — assesses model performance.
- ROC Curve — plots receiver-operating-characteristic curve for binary classifiers.
📈 7. Visualization Widgets
Turn data into visual representations:
- Bar Chart — displays data in bar-chart format.
- Heat Map — visualises data using a heatmap.
- Scatter Plot — 2-variable relationship.
4.13 AI Project Cycle Mapped to Orange Data Mining
| AI Project Cycle Stage | Orange Widget / Action |
|---|---|
| 1. Problem Scoping | Define the problem statement (done outside Orange — what are you predicting?). |
| 2. Data Acquisition | File / URL widget to load the dataset. |
| 3. Data Exploration | Data Table, Scatter Plot, Distributions widgets to explore. |
| 4. Modeling | Linear Regression, Classification Tree, k-Means, SVM, Logistic Regression widgets. |
| 5. Evaluation | Test & Score, Cross Validation, ROC Curve widgets. |
| 6. Deployment | Prediction widget; then integrate into an application. |
4.14 Case Study — Palmer Penguins
🎯 Stage 1 — Problem Scoping
The researchers want to predict the species of Palmer Penguins based on collected data. The dataset has three species — Adelie, Chinstrap, Gentoo — and physical features differ across them.
📥 Stage 2 — Data Acquisition
Features in the dataset include:
- Species — Adelie / Chinstrap / Gentoo (target label).
- Bill Length (mm).
- Bill Depth (mm).
- Flipper Length (mm).
- Body Mass (g).
- Island — where the penguin was observed.
- Sex — male / female.
🔍 Stage 3 — Data Exploration
- Load dataset via File widget → connect to Data Table to inspect.
- Use Scatter Plot — check how Bill Length vs Flipper Length clusters the three species.
- Use Distributions — see how Body Mass differs across species.
- Look for missing values → use Impute if needed.
🤖 Stage 4 — Modelling
- Since "species" is a category, this is a Classification problem.
- Use widgets like Classification Tree, k-Nearest Neighbours, or Logistic Regression.
- Connect the File widget → Classifier → Test & Score.
📊 Stage 5 — Evaluation
- Use Test & Score — check Accuracy, Precision, Recall, F1 for each species.
- Use Confusion Matrix to see which species is being confused with which.
- If accuracy is poor, try different algorithms or tune parameters.
🚀 Stage 6 — Prediction / Deployment
- Feed new unseen penguin measurements into the trained model.
- The Prediction widget outputs the predicted species.
- Export the trained model for use in a real app — e.g., a wildlife-research mobile app.
4.15 Limitations of No-Code AI Tools
- Only work for standard problem types — custom problems still need code.
- Limited customisation — fixed widgets and options.
- May not scale for very large datasets.
- Security concerns with sensitive data.
- Automation bias — users may trust results without questioning.
- Not suitable for complex Deep-Learning research.
Quick Revision — Key Points to Remember
- Data Science = unifies statistics + data analysis + ML + their related methods.
- Applications: internet search (Google processes 20 PB/day), targeted ads, recommendations (Amazon/Netflix), genetics & genomics, healthcare, fraud detection, stock market, weather.
- 3 coding approaches: High-Code (manual coding) · Low-Code (visual + some code) · No-Code (drag-drop only, no coding).
- Why No-Code AI: no code errors, saves cost, no AI hiring, easy to use, visual, fast prototyping.
- Who uses No-Code: non-technical people — doctors, architects, musicians, teachers.
- Benefits: accessible, cost-effective, time-saving, visual, low-risk, easy iteration.
- Disadvantages: lack of flexibility · automation bias · security issues.
- Popular No-Code Tools: Azure ML (2014) · Google Cloud AutoML (2018) · Orange Data Mining (1996) · Lobe AI (2015) · Teachable Machine (2017) · Data Robot.
- Kayla's Zoo Example: food-price prediction built in Orange without coding using 7 steps.
- Statistics concepts: Population & Sample · Mean (average) · Median (middle) · Mode (most frequent) · Distribution (frequency chart) · Normal distribution · Probability · Variance · Standard Deviation · Outlier.
- MS Excel: Analysis ToolPak add-in → scatter plot → trendline → regression equation (y = mx + c).
- Orange Data Mining = open-source drag-and-drop data-mining tool by Uni of Ljubljana.
- Orange widget categories: Data Loading (File/URL/Data Table) · Exploration (Scatter Plot, Distributions) · Preprocessing (Impute, Normalize) · Feature Selection · Modelling (Classification Tree, k-Means, SVM, Linear/Logistic Regression) · Evaluation (Test & Score, Cross Validation, ROC) · Visualization (Bar Chart, Heat Map).
- AI Project Cycle in Orange: Problem Scoping → File widget → Data Table/Scatter → Model widget → Test & Score → Prediction.
- Palmer Penguins Case Study: classify species (Adelie, Chinstrap, Gentoo) using Bill Length, Bill Depth, Flipper Length, Body Mass, Island, Sex.
- No-Code limitations: limited customisation, scalability issues, security concerns, automation bias.