LLM vs ML vs DL (EN)

Catégorie	Cours
Statut	préparé

LLM

📌

LLM = Large Language Model

Definition

ChatGPT is an LLM, which means a Large Language Model. This means it is trained on huge volumes of text to understand, generate, rephrase, or translate human language.

An LLM is a large language model based on artificial intelligence (AI) that can process and generate text by imitating human language.

Where does the text come from?

The training texts come from very large amounts of public text data, automatically collected from the Internet.

Public websites

Wikipedia (in all languages)

Forums (Reddit, StackOverflow…)

Blogs, articles, tutorials, manuals, etc.

Q&A websites (Quora, etc.)

These texts are publicly accessible and used to teach natural language, facts, grammar rules…

Books and literature

Public domain books (e.g., free to use)

Academic literature, textbooks, essays, etc.

Some book datasets are legally available for training (e.g., Project Gutenberg).

Scientific and technical data

arXiv publications (scientific preprints)

StackExchange / GitHub data (code, documentation)

These help train the model on scientific, mathematical, and technical language…

What is NOT used:

Private data (emails, private messages, etc.)

Copyrighted data without explicit permission

Paid or confidential data (internal company docs, Google Docs, etc.)

GPT-4 was also “fine-tuned” afterwards

After pretraining, the model was fine-tuned using supervised learning (humans rate good answers) and reinforcement learning (to improve conversation quality). Clickworkers.

AI may seem “magical,” but it’s based on a lot of often-invisible human work.

This raises questions about social justice, transparency, and exploitation.

OpenAI, Google, Meta and others are often criticized for lack of transparency and the working conditions of these workers.

What data is used to train an LLM to write code?

GitHub (public repos)

Open-source code in Python, JavaScript, Java, C++, etc.

Comments, README.md files, unit tests, install scripts

GPT-3 and Codex (the model behind GitHub Copilot) were trained on
billions of lines of code

StackOverflow / StackExchange

Code examples with Q&A

Best practices, common mistakes

Technical documentation

Python docs, MDN Web Docs, Java API, etc.

Framework docs (React, Django, Flask…)

Tutorials, blogs, articles

Commented code, educational projects

Step-by-step explanations

Specialized datasets (arXiv, Papers with Code, etc.)

Academic code for AI, data science, etc.

How does the model learn code?

Same as with text:

Code is turned into tokens (keywords, symbols, variable names, etc.)

The model learns to predict the next part of an instruction

It understands structures:
- if, for, while
- Indentation
- Function and object declarations
- Error handling, tests, etc.

Most importantly: it learns to connect code with comments and business logic

Size of the training dataset

GPT-4 (2023) – data not officially published

OpenAI has not revealed the exact size of the GPT-4 dataset.

But experts estimate:

The volume may exceed 1,000 GB to 10,000 GB of text (1 to 10 TB)

The number of tokens (keywords, symbols, variable names, etc.) may be in the trillions

The model was also likely trained with more books, code (GitHub), technical documents, and human corrections.

Model	Estimated text volume	Approx. word count
GPT-2 (2019)	~40 GB	~40 billion
GPT-3 (2020)	~570 GB	~250 billion
GPT-4 (2023)	> 1 TB	> 1 trillion (estimated)

How LLM is working ?

These models are trained on huge amounts of text data and use advanced architectures like deep neural networks (Deep Learning), especially Transformers, introduced by Google in 2017.

An LLM follows several key steps:

Training on billions of texts
1. It learns the structure of language by analyzing texts from articles, books, online discussions, etc.
1. The larger the model, the more powerful and accurate it becomes.

Use of Transformer technology
1. The Transformer architecture (e.g., GPT, BERT) uses a mechanism called “Attention” to give more importance to key words in a sentence (prompt).
1. The Transformer is an AI architecture introduced by Google in 2017 (in the paper “Attention is All You Need”). It’s the base of GPT, BERT, ChatGPT, etc.
1. Why was the Transformer created?
  - Before, language models used systems like RNNs or LSTMs, which read sentences word by word, from left to right.
  Problem:
  - They had trouble with long sentences.
  - They forgot what happened at the beginning of a text.
  - The Transformer solved this.
    - The core of the Transformer: attention. How does attention work?
    - Attention does a calculation: for each word, which other words should I focus on? And how much? It gives weights (for example: 80% to “cat”, 10% to “mouse”, 10% to “was chasing”) to understand the meaning.
    - Once the model finds the important words using attention, it sends this info to layers of neurons (math functions) to improve understanding and predict the next word.
  - The Transformer reads the whole text in parallel (not word by word).
  - That’s why GPT, BERT, and ChatGPT are so powerful.
- Examples:
  “It’s very hot in summer, so I go to the…”
  The model might suggest endings like: “beach”, “sea”, “pool”, etc. because these are the most likely words in that context.
  The language model looks at the words before and calculates the probabilities of what comes next:
  - “beach” → 62%
  - “mountains” → 20%
  - “pool” → 15%
  - “school” → 1%
  It chooses the most probable word (or one of the top choices for diversity), then repeats for the next word.
  There's a lot of context:
  - the model can consider full sentences,
  - it's trained on billions of sentences,
  - it's not guessing randomly, it’s based on what it has “seen” in the data.
  “This morning, I drank a big glass of…”
  Here's what the AI might predict, with estimated probabilities:
  Suggested word Probability
  milk 45%
  juice 30%
  coffee 15%
  hot chocolate 5%
  tea 3%
  wine 1%
  vinegar 0.1%
  “When it rains, I like to stay home and watch…”
  Word or phrase Probability
  TV 50%
  a movie 20%
  a series 15%
  Netflix 8%
  the rain falling 4%
  YouTube videos 2%
  nothing at all 1%

Suggested word	Probability
milk	45%
juice	30%
coffee	15%
hot chocolate	5%
tea	3%
wine	1%
vinegar	0.1%

Word or phrase	Probability
TV	50%
a movie	20%
a series	15%
Netflix	8%
the rain falling	4%
YouTube videos	2%
nothing at all	1%

Text generation based on probabilities

When you ask a question, the model predicts the next word based on the context.

It generates coherent answers without truly "understanding" like a human.

Google released the Transformer as open source

When Google invented the Transformer architecture in 2017, a team from Google Brain published a paper called “Attention Is All You Need”. Most importantly, they did not patent the algorithm.

They:

published the code and ideas,

allowed anyone to use, modify, and improve them.

Result:

Facebook (Meta) created BART and LLaMA

OpenAI created GPT-2, GPT-3, GPT-4

Google continued with BERT, then PaLM, then Gemini

Hundreds of teams around the world also built their own Transformer models

In science, ideas are meant to be shared

In AI (and science in general), it's common to publish discoveries so that:

everyone can benefit,

research moves faster,

we can build on others’ work.

Open source and scientific papers are the foundation of this collaboration.

What makes companies different?

The amount of training data (OpenAI used much more data for GPT-4 than others)

Computing power

Optimization (speed, safety, alignment, etc.)

User experience (interface, API, etc.)

Examples of Popular LLMs

GPT (Generative Pre-trained Transformer) – Used by ChatGPT (OpenAI)

BERT (Bidirectional Encoder Representations from Transformers) – Google

LLaMA (Large Language Model Meta AI) – Meta (Facebook)

Claude (Anthropic) – Another advanced model

LLMs can be specialized: code analysis, text writing, customer support, document summarization…

Applications of LLMs

Chatbots (ChatGPT, Bard)

Text summarization and analysis

Machine translation (Google Translate, DeepL)

Code generation (Copilot, Codeium)

Content creation (articles, scripts, etc.)

Limitations and Challenges

Bias and errors – Models learn biases from the data they are trained on.

Lack of real reasoning – They predict words but don’t truly “understand.”

Data dependence – A poorly trained model can produce incorrect answers.

High energy cost – Training an LLM uses a lot of computing resources.

It’s not like a human. The model doesn’t “understand” in a conscious way:
- It doesn’t reason like a human.
- It doesn’t know anything by itself. It repeats what it has seen in its training data, based on statistical probability.

But it is very good at imitating human language, solving problems, generating code, translating, summarizing...

Deep Learning

Deep learning is an artificial intelligence technique that uses deep neural networks (made of multiple layers).

It can be used to:

understand images (computer vision),

recognize sounds (audio, voice),

play games (intelligent agents),

process text (NLP)...

2. LLM: an application of deep learning

An LLM (Large Language Model) is a special type of deep learning model, specialized in natural language.

Machine learning

Machine Learning is a branch of artificial intelligence that allows a machine to learn from data.

2 main types of Machine Learning

Supervised learning
- The algorithm is given input data and the expected answers.
- Example: predicting the price of an apartment based on its size, location, etc.
- Common tasks:
  - Classification (spam or not spam, disease or not)
  - Regression (predicting a numeric value: price, temperature, score)

Unsupervised learning
- The algorithm is only given input data, with no expected answers. It must discover the structure.
- Example: grouping customers into marketing segments based on purchase behavior.
- Common tasks:
  - Clustering (groups, profiles)
  - Dimensionality reduction (simplifying complex data, like in text or image analysis)

Example: detecting bank fraud (supervised)

Input data: amount, location, time, type of purchase

Expected answer: fraudulent or normal transaction

Steps:
1. Collect a historical dataset
1. Train a model on this dataset
1. Test the model on new data
1. Use it in production to analyze transactions in real time

Common algorithms

Linear / logistic regression

Decision trees

Tools and languages

Python, with libraries like scikit-learn, TensorFlow, PyTorch

R: often used in statistics and data science

Jupyter Notebook: for easy experimentation

Google Colab: to test in the cloud

Create your own ML model

You can train a model on your own data to solve a specific problem (prediction, detection, recommendation…).

Steps to create a Machine Learning model

Define the problem
What is your goal?
- Predict revenue?
- Classify emails (spam / not spam)?
- Recommend products?

Collect the data
Reliable and well-structured data.
Examples:
- A CSV file with columns: height, weight, age, diagnosis
- A customer database with purchase history

Prepare the data
This is often the longest step.
- Cleaning: remove missing values, duplicates
- Transformation: convert text to numbers, normalize values
- Split: divide into training set and test set

Choose an algorithm
Examples:
- Linear regression to predict a value
- Decision tree for classification
- K-means for grouping

Train the model
This is where the computer learns.

Evaluate the model
Test its accuracy on data it has never seen.

Improve the model
- Try a different algorithm
- Add more data
- Improve the features (variables)

Use the model
You can now make predictions on new data:

Tools:

Python with scikit-learn, pandas, matplotlib

Google Colab to test without installing anything

Jupyter Notebook locally

Example

Netflix is a large-scale example of Machine Learning, used in almost every part of its product. Here's how Netflix uses Machine Learning:

Content recommendation (core system)
- Suggests series and movies each user might like.
- You watched Stranger Things and Dark? You might be shown 1899, even if you never searched for it.

Personalized thumbnails (homepage posters)
To increase the chance you’ll click on a show.
Netflix tests different images for the same content based on your preferences:
- If you like comedies: image with a funny character
- If you like thrillers: dark and dramatic image
It’s A/B testing + supervised ML: they see what works best for each profile.

Streaming quality prediction
Predict interruptions or slowdowns to improve streaming.
- ML on network data (location, device, streaming history…)
- Auto-adjusting quality (adaptive bitrate)
- Anticipating peak times to preload content smartly

Content production optimization
Decide which projects to fund (movies, series), based on audience preferences.
- Analyzing engagement on genres / formats
- Predicting the success of a script, casting, duration…
- Example: House of Cards was greenlit partly because data showed people liked both Kevin Spacey and political dramas.

Fighting fraud and account sharing
Detect suspicious behavior (excessive sharing, bots…)
- Analyzing IP addresses, times, devices
- Anomaly detection models
- Segmenting "suspicious" users

ML vs DL

Criterion	Machine Learning (classic)	Deep Learning
Types of data	Tabular data (Excel, CSV)	Unstructured data (image, sound, text, video)
Data volume	Works with small datasets	Needs a lot of data (millions of examples)
Computing power	Moderate, often runs on CPU	Very high, often needs GPUs
Explainability	Easier to interpret	Black box, hard to explain
Typical AI uses	Price prediction, diagnostics, scoring	Facial recognition, translation, generative AI
Known tools	Scikit-learn, XGBoost	TensorFlow, PyTorch, Keras

Tool / Platform	Main AI type	Underlying technology	Typical uses
Scikit-learn	ML (Machine Learning)	Trees, SVM, regressions, clustering	Data analysis, simple prediction, classification
XGBoost / LightGBM	ML	Tree boosting	High-performance models on tabular data (finance, scoring…)
Orange Data Mining	ML	Graphical interface for ML algorithms	Education, visualization, classification, clustering
RapidMiner	ML	ML workflows	Business analytics, scoring, anomaly detection
TensorFlow	DL	Neural networks	Vision, audio, language processing, generative AI
PyTorch	DL	Neural networks	AI research, NLP, Computer Vision
Keras	DL (high-level)	TensorFlow abstraction	Neural network prototyping
OpenCV + ML	ML	Built-in SVM, kNN	Simple computer vision (object, face detection)
OpenCV + DL (with DNN)	DL	Pretrained CNNs	Advanced vision (real-time detection, facial recognition)
ChatGPT / GPT	DL	Transformer, NLP	Chat, text generation, summarization, code
Claude / Gemini / LLaMA	DL	Transformer	Large language models
DALL·E / MidJourney	DL	Diffusion models / GAN	Image generation from text
Whisper (OpenAI)	DL	Audio-to-text with Transformer	Speech transcription

IA forte et IA faible

Strong AI and Weak AI

💡
It's not just ChatGPT and generative AI. AI has already been used in various forms for several years.

Two main types of AI

Strong AI (hypothetical future)
- Description: AI capable of understanding, learning, and performing any task a human can do, with similar cognitive abilities.
- Examples:
  - It doesn’t exist yet in real life, but is often shown in science fiction movies, like Jarvis in Iron Man.
- Goal: To be as flexible and intelligent as a human.
- A science fiction concept. For example, author Philip K. Dick.
- Many movies talk about it:
  - Westworld
  - Detroit: Become Human (video game)
  - A.I. Artificial Intelligence by Steven Spielberg, based on an idea by Stanley Kubrick (2001)
    - Synopsis: In a future where AIs are humanoid robots, a robot boy named David, designed to love, desperately tries to become "human."
    - Why watch it? The film explores the identity and emotions of AI, and questions what it means to be human.
    - Videos:
      - https://www.youtube.com/watch?v=a6lTk0mLaGo
      - https://www.youtube.com/watch?v=h5e-7lnNNKQ
      - https://www.dailymotion.com/video/x8tjna6
  - I, Robot by Alex Proyas (2004)
    - Synopsis: Based on works by Isaac Asimov, this film follows a detective investigating a murder possibly involving a robot. It explores moral dilemmas around intelligent robots.
    - Why watch it? It questions how humans and strong AI could coexist in a world governed by ethical laws for robots.
    - Videos:
      - https://www.youtube.com/watch?v=PjECS38ZGWE
      - https://www.allocine.fr/video/player_gen_cmedia=18365941&cfilm=47739.html
  - Her by Spike Jonze (2013)
    - Synopsis: Theodore, a lonely man, develops an intimate relationship with an intelligent operating system named Samantha. Samantha evolves by learning from interactions with Theodore.
    - Why watch it? The film discusses the emotional and social impact of AIs that can feel and understand human emotions.
    - Videos:
      - https://www.youtube.com/watch?v=5ttWV3D44Zs
      - https://www.youtube.com/watch?v=qTTMrJ3hRn0
  - Ex Machina by Alex Garland (2014)
    - Synopsis: A young programmer is invited by a billionaire to test the intelligence of a humanoid robot named Ava, who has advanced AI. The film explores ethical, emotional, and philosophical questions about strong AI.
    - Why watch it? It raises deep questions about consciousness, free will, and the relationship between humans and machines.
    - Video:
      - https://www.youtube.com/watch?v=vH76lOXIIXM
  - Ghost in the Shell by Rupert Sanders (2017)
    - Synopsis: In a cybernetic world, a cyborg investigates a hacker who challenges the boundary between human and AI.
    - Why watch it? A philosophical reflection on the soul, consciousness, and the fusion of human and machine. Humans becoming robots.
    - Videos:
      - https://www.youtube.com/watch?v=fuS2u3ekWJ0
      - https://www.youtube.com/watch?v=cmnFNb93Cgo
  - Blade Runner by Ridley Scott (1982) & Blade Runner 2049 by Denis Villeneuve (2017)
    - Synopsis: In a futuristic world, "replicants" (advanced androids) try to understand their place in society.
    - Why watch it? These films explore what makes intelligence (or life) authentic, examining empathy and the soul.
    - Video:
      - https://www.dailymotion.com/video/x61lewv

Weak AI
- Description: Weak AI is designed to do a specific task. It is specialized in one single function.
- Examples:
  - Google Translate
  - Siri or Alexa: Voice recognition and responses
  - Recommendation systems: Netflix or YouTube
  - Video games: Computer-controlled opponents
  - Facial recognition
  - IBM’s Deep Blue: The program that beat chess champion Garry Kasparov in 1997
  - Self-driving cars: They analyze roads, signs, and other vehicles in real time
  - Medical diagnosis: Scientists collect medical knowledge and build logic rules for the AI. The AI analyzes symptoms and test results to suggest possible causes and make diagnoses.
    These AIs are already used by doctors to help diagnose rare or complex diseases.
  - Content creation: helping creators save time, improve quality, and generate innovative ideas.
    - Automated writing: Language models like GPT can write articles, blogs, product descriptions, emails, and more.
    - Rewriting help: Rephrasing sentences or paragraphs to make them clearer or more convincing.
    - Summarizing: Turning complex documents into short summaries.
    - Idea generation: Suggesting catchy titles, content topics, or original angles.
    - Research and information gathering: AI can help find specific info or analyze trends.
    - Editorial planning: Suggest publishing calendars based on past performance and market trends.
    - Performance analysis: Measure impact and suggest adjustments (titles, formats, visuals).
    - AI acts like a powerful assistant to speed up creation, improve quality, and open new creative possibilities — while keeping humans at the center of decision-making.
  - Chatbots that only answer predefined questions and can't adapt to complex situations
  - ChatGPT is designed to do a specific task: generate text and respond to questions in a coherent and relevant way. It doesn’t have general understanding or consciousness like a human. ChatGPT is an advanced example of weak AI — it is excellent at one particular task (natural language processing) without having the general or adaptive abilities of strong AI.
  - Automation and robotics:
    - Robotic machines learn automatically to improve and speed up movements. For example, in a production line.
    - Machines are designed to communicate with each other. A designer at the beginning shows the product plan, then the machines work and coordinate to complete it by themselves.
- Limitation: It doesn’t "understand" the task. It follows defined algorithms. Even if its responses seem smart, it doesn’t truly understand what it's saying. It uses statistical models to predict words based on the data it was trained on.
  ChatGPT cannot solve problems outside its predefined scope (for example, it can’t drive a car, design an electronic circuit, or reason abstractly beyond its training domain).

People

Marvin Minsky

American who created working groups and conferences in the 1950s and 1960s that led to the birth of Artificial Intelligence.

Minsky co-founded the MIT Artificial Intelligence Laboratory (now CSAIL, Computer Science and Artificial Intelligence Laboratory) with John McCarthy. This lab is one of the most influential AI research centers in the world and has contributed to major advancements in computing, robotics, and AI.

Luc Julia

French expert. Designer of Siri.

In his public speeches in France, Luc Julia takes a more balanced view on artificial intelligence; he prefers to use the term “augmented intelligence.” In his talks, he opposes the ideas of some tech personalities, like Elon Musk. For him, it is humans who are and will remain in control of artificial intelligence. He explains that humans have the choice to use these tools in the right way to improve society.

John Hopfield

Nobel Prize 2024. In 1982: artificial neural network.

Geoffrey Hinton

Nobel Prize 2024. In 1985: training method for neural networks.

Two of his students at the University of Toronto, Alex Krizhevsky and Ilya Sutskever, were the developers of ChatGPT.

Generative AI Use Case

Use case of Hanna Mergui, a PhD student working on medical imaging and its use with AI.

Her background: https://youtu.be/zC8xdkTxuFc?si=KdQ_x-ijHsHlZ5Jr&t=181
Anna chose a career in computer science because she wanted something she could take abroad, and a field that is always evolving.
She first studied at the MathInfo university program, then went to Dauphine to study business computing. There, she took an introduction to AI course, and that was before the rise of ChatGPT and the AIs we know today.
Later, Anna joined Polytechnique through a bridge between universities and engineering schools. There, she completed a master's degree in AI, covering:
- the history of artificial intelligence,
- its various applications in imaging,
- video,
- and sound.
Finally, Anna wanted to apply her skills to a field close to her heart: medicine.

Use case in medical imaging:
- Startup Sonio; fetal weight estimation
  - https://youtu.be/zC8xdkTxuFc?si=1kuh8rT5RXZPGgq3&t=491
- PhD thesis on malformations detected during prenatal ultrasounds
  - https://youtu.be/zC8xdkTxuFc?si=Hptap6bk8wsZyS6J&t=568
  - An AI trained with ultrasound images.
  - AI needs thousands of data points and images to train.
  - Generating fake images to train the AI.
  - Generative AI models that are capable of creating these fake images. Two types of generative AI:
    - GAN: generator and discriminator
    - Diffusion models (DALL·E, MidJourney): deconstructing an image and rebuilding it using a prompt. Or AudioLM, MusicLM: generating sound and music
  - Understanding the domain
  - Ethics
  - Data anonymization
  - The doctor is not replaced — the machine helps with diagnosis. It’s like the doctor has an assistant. For example, the AI does not make any decisions. It provides lots of information and sees things (without getting tired…)

CREATE Generative AI Model

📌
How do you train a generative AI model?

Training an AI model with images means teaching it to understand or generate images by giving it a large number of examples.

1. Define the goal

First, you need to know what you want the model to do:

Goal	Example	Model type
Classification	"Is it a cat or a dog?"	CNN
Detection	"Where is the face in the image?"	R-CNN, YOLO
Generation	"Create an image of a cat"	GAN, Diffusion
Segmentation	"Color each pixel by object"	U-Net, Mask R-CNN

2. Prepare the data

Gather images

Download a dataset (e.g.: CIFAR-10, ImageNet, COCO)

Or use your own images

Clean and organize

Remove blurry or useless images

Create one folder per class (e.g.: data/cats/, data/dogs/)

Resize and normalize

Make all images the same size (e.g.: 224×224 pixels)

Convert pixel values from [0–255] to [0–1] or [-1, 1] using a script (to fit AI algorithm needs)

3. Choose a model

The most common are convolutional neural networks (CNN) for image analysis:

For beginners: ResNet, VGG, MobileNet

For generation: GAN, UNet, Stable Diffusion

You can also use Transfer Learning (reusing a pre-trained model).

4. Train the model

Use a framework

PyTorch

TensorFlow / Keras

Simple example using Keras

📌
Keras is a high-level open-source library in Python that allows you to easily build, train, and test AI models, especially neural networks.
Keras is used to create deep learning models.

You can do it on Google Colab (free, with GPU, fast parallel processing), for example.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D

# Data preparation
datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
train = datagen.flow_from_directory("data/", target_size=(224, 224), subset='training')
val = datagen.flow_from_directory("data/", target_size=(224, 224), subset='validation')

# Model creation
base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
model = Sequential([
    base_model,
    GlobalAveragePooling2D(),
    Dense(1, activation='sigmoid')  # binary (cat/dog)
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train, validation_data=val, epochs=5)

5. Evaluate the model

Error statistics

Test with images the model has never seen before

6. Use or deploy

Convert the model (.h5, .pt, .onnx)

Deploy in a mobile app, website, or backend

Useful tools

Need	Recommended tools
Image annotation	LabelImg, MakeSense.ai
Computer vision	OpenCV, torchvision
Cloud training	Google Colab, Kaggle, AWS SageMaker