đź“•

LLM vs ML vs DL (EN)

CatégorieCours
Statutpréparé
LLM
📌

LLM = Large Language Model

Definition

ChatGPT is an LLM, which means a Large Language Model. This means it is trained on huge volumes of text to understand, generate, rephrase, or translate human language.

An LLM is a large language model based on artificial intelligence (AI) that can process and generate text by imitating human language.

Where does the text come from?

The training texts come from very large amounts of public text data, automatically collected from the Internet.

Public websites

  • Wikipedia (in all languages)
  • Forums (Reddit, StackOverflow…)
  • Blogs, articles, tutorials, manuals, etc.
  • Q&A websites (Quora, etc.)

These texts are publicly accessible and used to teach natural language, facts, grammar rules…

Books and literature

  • Public domain books (e.g., free to use)
  • Academic literature, textbooks, essays, etc.

Some book datasets are legally available for training (e.g., Project Gutenberg).

Scientific and technical data

  • arXiv publications (scientific preprints)
  • StackExchange / GitHub data (code, documentation)

These help train the model on scientific, mathematical, and technical language…

What is NOT used:

  • Private data (emails, private messages, etc.)
  • Copyrighted data without explicit permission
  • Paid or confidential data (internal company docs, Google Docs, etc.)

GPT-4 was also “fine-tuned” afterwards

After pretraining, the model was fine-tuned using supervised learning (humans rate good answers) and reinforcement learning (to improve conversation quality). Clickworkers.

  • AI may seem “magical,” but it’s based on a lot of often-invisible human work.
  • This raises questions about social justice, transparency, and exploitation.
  • OpenAI, Google, Meta and others are often criticized for lack of transparency and the working conditions of these workers.

What data is used to train an LLM to write code?

GitHub (public repos)

  • Open-source code in Python, JavaScript, Java, C++, etc.
  • Comments, README.md files, unit tests, install scripts
GPT-3 and Codex (the model behind GitHub Copilot) were trained on

billions of lines of code

StackOverflow / StackExchange

  • Code examples with Q&A
  • Best practices, common mistakes

Technical documentation

  • Python docs, MDN Web Docs, Java API, etc.
  • Framework docs (React, Django, Flask…)

Tutorials, blogs, articles

  • Commented code, educational projects
  • Step-by-step explanations

Specialized datasets (arXiv, Papers with Code, etc.)

  • Academic code for AI, data science, etc.

How does the model learn code?

Same as with text:

  • Code is turned into tokens (keywords, symbols, variable names, etc.)
  • The model learns to predict the next part of an instruction
  • It understands structures:
    • if, for, while
    • Indentation
    • Function and object declarations
    • Error handling, tests, etc.

Most importantly: it learns to connect code with comments and business logic


Size of the training dataset

GPT-4 (2023) – data not officially published

OpenAI has not revealed the exact size of the GPT-4 dataset.

But experts estimate:

  • The volume may exceed 1,000 GB to 10,000 GB of text (1 to 10 TB)
  • The number of tokens (keywords, symbols, variable names, etc.) may be in the trillions

The model was also likely trained with more books, code (GitHub), technical documents, and human corrections.

ModelEstimated text volumeApprox. word count
GPT-2 (2019)~40 GB~40 billion
GPT-3 (2020)~570 GB~250 billion
GPT-4 (2023)> 1 TB> 1 trillion (estimated)
How LLM is working ?

These models are trained on huge amounts of text data and use advanced architectures like deep neural networks (Deep Learning), especially Transformers, introduced by Google in 2017.

An LLM follows several key steps:

  1. Training on billions of texts
    1. It learns the structure of language by analyzing texts from articles, books, online discussions, etc.
    1. The larger the model, the more powerful and accurate it becomes.
  1. Use of Transformer technology
    1. The Transformer architecture (e.g., GPT, BERT) uses a mechanism called “Attention” to give more importance to key words in a sentence (prompt).
    1. The Transformer is an AI architecture introduced by Google in 2017 (in the paper “Attention is All You Need”). It’s the base of GPT, BERT, ChatGPT, etc.
    1. Why was the Transformer created?
      • Before, language models used systems like RNNs or LSTMs, which read sentences word by word, from left to right.

      Problem:

      • They had trouble with long sentences.
      • They forgot what happened at the beginning of a text.
      • The Transformer solved this.
        • The core of the Transformer: attention. How does attention work?
        • Attention does a calculation: for each word, which other words should I focus on? And how much? It gives weights (for example: 80% to “cat”, 10% to “mouse”, 10% to “was chasing”) to understand the meaning.
        • Once the model finds the important words using attention, it sends this info to layers of neurons (math functions) to improve understanding and predict the next word.
      • The Transformer reads the whole text in parallel (not word by word).
      • That’s why GPT, BERT, and ChatGPT are so powerful.
    • Examples:
      “It’s very hot in summer, so I go to the…”

      The model might suggest endings like: “beach”, “sea”, “pool”, etc. because these are the most likely words in that context.

      The language model looks at the words before and calculates the probabilities of what comes next:

      • “beach” → 62%
      • “mountains” → 20%
      • “pool” → 15%
      • “school” → 1%

      It chooses the most probable word (or one of the top choices for diversity), then repeats for the next word.

      There's a lot of context:

      • the model can consider full sentences,
      • it's trained on billions of sentences,
      • it's not guessing randomly, it’s based on what it has “seen” in the data.
      “This morning, I drank a big glass of…”

      Here's what the AI might predict, with estimated probabilities:

      Suggested wordProbability
      milk45%
      juice30%
      coffee15%
      hot chocolate5%
      tea3%
      wine1%
      vinegar0.1%
      “When it rains, I like to stay home and watch…”
      Word or phraseProbability
      TV50%
      a movie20%
      a series15%
      Netflix8%
      the rain falling4%
      YouTube videos2%
      nothing at all1%

Text generation based on probabilities

  1. When you ask a question, the model predicts the next word based on the context.
  1. It generates coherent answers without truly "understanding" like a human.

Google released the Transformer as open source

When Google invented the Transformer architecture in 2017, a team from Google Brain published a paper called “Attention Is All You Need”. Most importantly, they did not patent the algorithm.

They:

  • published the code and ideas,
  • allowed anyone to use, modify, and improve them.

Result:

  • Facebook (Meta) created BART and LLaMA
  • OpenAI created GPT-2, GPT-3, GPT-4
  • Google continued with BERT, then PaLM, then Gemini
  • Hundreds of teams around the world also built their own Transformer models

In science, ideas are meant to be shared

In AI (and science in general), it's common to publish discoveries so that:

  • everyone can benefit,
  • research moves faster,
  • we can build on others’ work.

Open source and scientific papers are the foundation of this collaboration.

What makes companies different?

  • The amount of training data (OpenAI used much more data for GPT-4 than others)
  • Computing power
  • Optimization (speed, safety, alignment, etc.)
  • User experience (interface, API, etc.)
Examples of Popular LLMs
  • GPT (Generative Pre-trained Transformer) – Used by ChatGPT (OpenAI)
  • BERT (Bidirectional Encoder Representations from Transformers) – Google
  • LLaMA (Large Language Model Meta AI) – Meta (Facebook)
  • Claude (Anthropic) – Another advanced model

LLMs can be specialized: code analysis, text writing, customer support, document summarization…

Applications of LLMs
  • Chatbots (ChatGPT, Bard)
  • Text summarization and analysis
  • Machine translation (Google Translate, DeepL)
  • Code generation (Copilot, Codeium)
  • Content creation (articles, scripts, etc.)
Limitations and Challenges
  • Bias and errors – Models learn biases from the data they are trained on.
  • Lack of real reasoning – They predict words but don’t truly “understand.”
  • Data dependence – A poorly trained model can produce incorrect answers.
  • High energy cost – Training an LLM uses a lot of computing resources.
  • It’s not like a human. The model doesn’t “understand” in a conscious way:
    • It doesn’t reason like a human.
    • It doesn’t know anything by itself. It repeats what it has seen in its training data, based on statistical probability.

But it is very good at imitating human language, solving problems, generating code, translating, summarizing...

Deep Learning

Deep learning is an artificial intelligence technique that uses deep neural networks (made of multiple layers).

It can be used to:

  • understand images (computer vision),
  • recognize sounds (audio, voice),
  • play games (intelligent agents),
  • process text (NLP)...

2. LLM: an application of deep learning

An LLM (Large Language Model) is a special type of deep learning model, specialized in natural language.

Machine learning

Machine Learning is a branch of artificial intelligence that allows a machine to learn from data.

2 main types of Machine Learning

  • Supervised learning
    • The algorithm is given input data and the expected answers.
    • Example: predicting the price of an apartment based on its size, location, etc.
    • Common tasks:
      • Classification (spam or not spam, disease or not)
      • Regression (predicting a numeric value: price, temperature, score)
  • Unsupervised learning
    • The algorithm is only given input data, with no expected answers. It must discover the structure.
    • Example: grouping customers into marketing segments based on purchase behavior.
    • Common tasks:
      • Clustering (groups, profiles)
      • Dimensionality reduction (simplifying complex data, like in text or image analysis)

Example: detecting bank fraud (supervised)

  • Input data: amount, location, time, type of purchase
  • Expected answer: fraudulent or normal transaction
  • Steps:
    1. Collect a historical dataset
    1. Train a model on this dataset
    1. Test the model on new data
    1. Use it in production to analyze transactions in real time

Common algorithms

  • Linear / logistic regression
  • Decision trees

Tools and languages

  • Python, with libraries like scikit-learn, TensorFlow, PyTorch
  • R: often used in statistics and data science
  • Jupyter Notebook: for easy experimentation
  • Google Colab: to test in the cloud

Create your own ML model

You can train a model on your own data to solve a specific problem (prediction, detection, recommendation…).

Steps to create a Machine Learning model

  • Define the problem

    What is your goal?

    • Predict revenue?
    • Classify emails (spam / not spam)?
    • Recommend products?
  • Collect the data

    Reliable and well-structured data.

    Examples:

    • A CSV file with columns: height, weight, age, diagnosis
    • A customer database with purchase history
  • Prepare the data

    This is often the longest step.

    • Cleaning: remove missing values, duplicates
    • Transformation: convert text to numbers, normalize values
    • Split: divide into training set and test set
  • Choose an algorithm

    Examples:

    • Linear regression to predict a value
    • Decision tree for classification
    • K-means for grouping
  • Train the model

    This is where the computer learns.

  • Evaluate the model

    Test its accuracy on data it has never seen.

  • Improve the model
    • Try a different algorithm
    • Add more data
    • Improve the features (variables)
  • Use the model

    You can now make predictions on new data:

Tools:

  • Python with scikit-learn, pandas, matplotlib
  • Google Colab to test without installing anything
  • Jupyter Notebook locally

Example

Netflix is a large-scale example of Machine Learning, used in almost every part of its product. Here's how Netflix uses Machine Learning:

  • Content recommendation (core system)
    • Suggests series and movies each user might like.
    • You watched Stranger Things and Dark? You might be shown 1899, even if you never searched for it.
  • Personalized thumbnails (homepage posters)

    To increase the chance you’ll click on a show.

    Netflix tests different images for the same content based on your preferences:

    • If you like comedies: image with a funny character
    • If you like thrillers: dark and dramatic image

    It’s A/B testing + supervised ML: they see what works best for each profile.

  • Streaming quality prediction

    Predict interruptions or slowdowns to improve streaming.

    • ML on network data (location, device, streaming history…)
    • Auto-adjusting quality (adaptive bitrate)
    • Anticipating peak times to preload content smartly
  • Content production optimization

    Decide which projects to fund (movies, series), based on audience preferences.

    • Analyzing engagement on genres / formats
    • Predicting the success of a script, casting, duration…
    • Example: House of Cards was greenlit partly because data showed people liked both Kevin Spacey and political dramas.
  • Fighting fraud and account sharing

    Detect suspicious behavior (excessive sharing, bots…)

    • Analyzing IP addresses, times, devices
    • Anomaly detection models
    • Segmenting "suspicious" users
ML vs DL
CriterionMachine Learning (classic)Deep Learning
Types of dataTabular data (Excel, CSV)Unstructured data (image, sound, text, video)
Data volumeWorks with small datasetsNeeds a lot of data (millions of examples)
Computing powerModerate, often runs on CPUVery high, often needs GPUs
ExplainabilityEasier to interpretBlack box, hard to explain
Typical AI usesPrice prediction, diagnostics, scoringFacial recognition, translation, generative AI
Known toolsScikit-learn, XGBoostTensorFlow, PyTorch, Keras
Tool / PlatformMain AI typeUnderlying technologyTypical uses
Scikit-learnML (Machine Learning)Trees, SVM, regressions, clusteringData analysis, simple prediction, classification
XGBoost / LightGBMMLTree boostingHigh-performance models on tabular data (finance, scoring…)
Orange Data MiningMLGraphical interface for ML algorithmsEducation, visualization, classification, clustering
RapidMinerMLML workflowsBusiness analytics, scoring, anomaly detection
TensorFlowDLNeural networksVision, audio, language processing, generative AI
PyTorchDLNeural networksAI research, NLP, Computer Vision
KerasDL (high-level)TensorFlow abstractionNeural network prototyping
OpenCV + MLMLBuilt-in SVM, kNNSimple computer vision (object, face detection)
OpenCV + DL (with DNN)DLPretrained CNNsAdvanced vision (real-time detection, facial recognition)
ChatGPT / GPTDLTransformer, NLPChat, text generation, summarization, code
Claude / Gemini / LLaMADLTransformerLarge language models
DALL·E / MidJourneyDLDiffusion models / GANImage generation from text
Whisper (OpenAI)DLAudio-to-text with TransformerSpeech transcription
IA forte et IA faible

Strong AI and Weak AI

đź’ˇ

It's not just ChatGPT and generative AI. AI has already been used in various forms for several years.

Two main types of AI

  • Strong AI (hypothetical future)
    • Description: AI capable of understanding, learning, and performing any task a human can do, with similar cognitive abilities.
    • Examples:
      • It doesn’t exist yet in real life, but is often shown in science fiction movies, like Jarvis in Iron Man.
    • Goal: To be as flexible and intelligent as a human.
    • A science fiction concept. For example, author Philip K. Dick.
    • Many movies talk about it:
      • Westworld
      • Detroit: Become Human (video game)
      • Ex Machina by Alex Garland (2014)
        • Synopsis: A young programmer is invited by a billionaire to test the intelligence of a humanoid robot named Ava, who has advanced AI. The film explores ethical, emotional, and philosophical questions about strong AI.
        • Why watch it? It raises deep questions about consciousness, free will, and the relationship between humans and machines.
      • Blade Runner by Ridley Scott (1982) & Blade Runner 2049 by Denis Villeneuve (2017)
        • Synopsis: In a futuristic world, "replicants" (advanced androids) try to understand their place in society.
        • Why watch it? These films explore what makes intelligence (or life) authentic, examining empathy and the soul.

  • Weak AI
    • Description: Weak AI is designed to do a specific task. It is specialized in one single function.
    • Examples:
      • Google Translate
      • Siri or Alexa: Voice recognition and responses
      • Recommendation systems: Netflix or YouTube
      • Video games: Computer-controlled opponents
      • Facial recognition
      • IBM’s Deep Blue: The program that beat chess champion Garry Kasparov in 1997
      • Self-driving cars: They analyze roads, signs, and other vehicles in real time
      • Medical diagnosis: Scientists collect medical knowledge and build logic rules for the AI. The AI analyzes symptoms and test results to suggest possible causes and make diagnoses.
        These AIs are already used by doctors to help diagnose rare or complex diseases.
      • Content creation: helping creators save time, improve quality, and generate innovative ideas.
        • Automated writing: Language models like GPT can write articles, blogs, product descriptions, emails, and more.
        • Rewriting help: Rephrasing sentences or paragraphs to make them clearer or more convincing.
        • Summarizing: Turning complex documents into short summaries.
        • Idea generation: Suggesting catchy titles, content topics, or original angles.
        • Research and information gathering: AI can help find specific info or analyze trends.
        • Editorial planning: Suggest publishing calendars based on past performance and market trends.
        • Performance analysis: Measure impact and suggest adjustments (titles, formats, visuals).
        • AI acts like a powerful assistant to speed up creation, improve quality, and open new creative possibilities — while keeping humans at the center of decision-making.
      • Chatbots that only answer predefined questions and can't adapt to complex situations
      • ChatGPT is designed to do a specific task: generate text and respond to questions in a coherent and relevant way. It doesn’t have general understanding or consciousness like a human. ChatGPT is an advanced example of weak AI — it is excellent at one particular task (natural language processing) without having the general or adaptive abilities of strong AI.
      • Automation and robotics:
        • Robotic machines learn automatically to improve and speed up movements. For example, in a production line.
        • Machines are designed to communicate with each other. A designer at the beginning shows the product plan, then the machines work and coordinate to complete it by themselves.
    • Limitation: It doesn’t "understand" the task. It follows defined algorithms. Even if its responses seem smart, it doesn’t truly understand what it's saying. It uses statistical models to predict words based on the data it was trained on.

      ChatGPT cannot solve problems outside its predefined scope (for example, it can’t drive a car, design an electronic circuit, or reason abstractly beyond its training domain).

People

Marvin Minsky

American who created working groups and conferences in the 1950s and 1960s that led to the birth of Artificial Intelligence.

Minsky co-founded the MIT Artificial Intelligence Laboratory (now CSAIL, Computer Science and Artificial Intelligence Laboratory) with John McCarthy. This lab is one of the most influential AI research centers in the world and has contributed to major advancements in computing, robotics, and AI.

Luc Julia

French expert. Designer of Siri.

In his public speeches in France, Luc Julia takes a more balanced view on artificial intelligence; he prefers to use the term “augmented intelligence.” In his talks, he opposes the ideas of some tech personalities, like Elon Musk. For him, it is humans who are and will remain in control of artificial intelligence. He explains that humans have the choice to use these tools in the right way to improve society.

John Hopfield

Nobel Prize 2024. In 1982: artificial neural network.

Geoffrey Hinton

Nobel Prize 2024. In 1985: training method for neural networks.

Two of his students at the University of Toronto, Alex Krizhevsky and Ilya Sutskever, were the developers of ChatGPT.

Generative AI Use Case

Use case of Hanna Mergui, a PhD student working on medical imaging and its use with AI.

  • Her background: https://youtu.be/zC8xdkTxuFc?si=KdQ_x-ijHsHlZ5Jr&t=181

    Anna chose a career in computer science because she wanted something she could take abroad, and a field that is always evolving.

    She first studied at the MathInfo university program, then went to Dauphine to study business computing. There, she took an introduction to AI course, and that was before the rise of ChatGPT and the AIs we know today.

    Later, Anna joined Polytechnique through a bridge between universities and engineering schools. There, she completed a master's degree in AI, covering:

    • the history of artificial intelligence,
    • its various applications in imaging,
    • video,
    • and sound.

    Finally, Anna wanted to apply her skills to a field close to her heart: medicine.

  • Use case in medical imaging:
    • PhD thesis on malformations detected during prenatal ultrasounds
      • An AI trained with ultrasound images.
      • AI needs thousands of data points and images to train.
      • Generating fake images to train the AI.
      • Generative AI models that are capable of creating these fake images. Two types of generative AI:
        • GAN: generator and discriminator
        • Diffusion models (DALL·E, MidJourney): deconstructing an image and rebuilding it using a prompt. Or AudioLM, MusicLM: generating sound and music
      • Understanding the domain
      • Ethics
      • Data anonymization
      • The doctor is not replaced — the machine helps with diagnosis. It’s like the doctor has an assistant. For example, the AI does not make any decisions. It provides lots of information and sees things (without getting tired…)
CREATE Generative AI Model
📌

How do you train a generative AI model?

Training an AI model with images means teaching it to understand or generate images by giving it a large number of examples.

1. Define the goal

First, you need to know what you want the model to do:

GoalExampleModel type
Classification"Is it a cat or a dog?"CNN
Detection"Where is the face in the image?"R-CNN, YOLO
Generation"Create an image of a cat"GAN, Diffusion
Segmentation"Color each pixel by object"U-Net, Mask R-CNN

2. Prepare the data

Gather images

  • Or use your own images

Clean and organize

  • Remove blurry or useless images
  • Create one folder per class (e.g.: data/cats/, data/dogs/)

Resize and normalize

  • Make all images the same size (e.g.: 224Ă—224 pixels)
  • Convert pixel values from [0–255] to [0–1] or [-1, 1] using a script (to fit AI algorithm needs)

3. Choose a model

The most common are convolutional neural networks (CNN) for image analysis:

  • For beginners: ResNet, VGG, MobileNet
  • For generation: GAN, UNet, Stable Diffusion

You can also use Transfer Learning (reusing a pre-trained model).

4. Train the model

Use a framework

  • PyTorch
  • TensorFlow / Keras

Simple example using Keras

📌

Keras is a high-level open-source library in Python that allows you to easily build, train, and test AI models, especially neural networks.

Keras is used to create deep learning models.

You can do it on Google Colab (free, with GPU, fast parallel processing), for example.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D

# Data preparation
datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
train = datagen.flow_from_directory("data/", target_size=(224, 224), subset='training')
val = datagen.flow_from_directory("data/", target_size=(224, 224), subset='validation')

# Model creation
base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
model = Sequential([
    base_model,
    GlobalAveragePooling2D(),
    Dense(1, activation='sigmoid')  # binary (cat/dog)
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train, validation_data=val, epochs=5)

5. Evaluate the model

  • Error statistics
  • Test with images the model has never seen before

6. Use or deploy

  • Convert the model (.h5, .pt, .onnx)
  • Deploy in a mobile app, website, or backend

Useful tools

NeedRecommended tools
Image annotationLabelImg, MakeSense.ai
Computer visionOpenCV, torchvision
Cloud trainingGoogle Colab, Kaggle, AWS SageMaker