The €50,000 Mistake: My First AI "Incident"
I still remember that cold morning in 2024 when an LLM (Large Language Model) – like ChatGPT, Gemini, Grok, etc. – hallucination cost about €50,000 at my workplace. We were using a fine tuned aligned model for a complex market analysis. It generated convincing, logically plausible – but completely fabricated – data on a niche market.
It wasn’t an obvious error. No “I’m sorry, I can’t do that” or “I don’t know.”
It was a confident statement, entirely false. It was a painful lesson that even “intelligent” AI models can fail in insidious ways.
In 2025, headlines rave about AI’s magic capabilities, but the truth is that under the hood it’s just complex math and probability.
As an engineer who has spent years building (and sometimes “breaking”) these systems, I’m here to demystify the process. This is the gentle introduction to explain how things work under the hood!
It’s not just about what language models do, but how they do it and why they sometimes fail, whether it is Gemini, ChatGPT, Claude, Kimi K2, Deepseek, Qwen or any other model.
What I'll talk about in this article:
- Decoding Language Models: Their probabilistic foundation and how they predict text.
- Scaling Intelligence: What makes a model "large."
- Innovative Architectures: The critical role of Transformers and self-attention.
- Navigating Risks: Hallucinations, jailbreaks, and cultural impact.
- My Unfiltered Playbook: Practical tips for tackling AI in 2025.
Decoding the Fundamentals
Let’s start at the beginning: what exactly are these “language models”, and how have they evolved into the giants we know today?
Understanding Language Models: The Probability Game
At its core, a language model is a machine learning model — a way computers learn from examples to automatically spot patterns and make predictions, improving their accuracy over time without being explicitly programmed for each specific task — designed to predict and generate plausible language. Think of your phone’s autocomplete — that’s a language model in action. These models work by estimating the probability that a token (a word or part of a word) or sequence of tokens occurs within a longer sequence.
For example the phrase “When I hear rain on my roof, I _______ in my kitchen.” – the language model calculates probabilities for different words to fill the gap, it might assign a 9.4% probability to “cook soup” or a 2.5% probability to “take a nap” It’s all about probabilities. The model doesn’t “think” – unless you consider the internal monologue some people have thinking but this applies only to models trained on "chains of thought" – it mechanically outputs arrays of words and numbers according to its internal learned logic.
Small looping video showcasing how the completion works
Scaling Intelligence: Defining “Large” in LLMs
Modeling human language at scale is extremely complex and resource-intensive.
Language models have exploded in size thanks to increased computer memory, larger datasets, greater compute power, and more effective techniques for modeling long sequences.
But just how “large” is a large language model? The term is fuzzy, but “large” has been used to describe models from BERT (110 million parameters) up to PaLM 2 (up to 540 billion parameters), there are even bigger models, but I think you got the hang of it, we could assume closed-source models like Gemini 2.5 Pro around trillions of parameters.
Parameters – often called "weights" in machine learning – are the model’s internal settings, fine-tuned through training to predict the next word or phrase by recognizing patterns in data. When people refer to a "large" model, they’re usually talking about its vast number of these weights or the enormous amount of text it studied to sharpen its predictions.
Here’s a quick look at how some notable models have grown:
Model | Parameters | Reference |
---|---|---|
BERT | 110~340 million | Research Paper |
LLaMA | 7~65 billions | Research Paper |
GPT-3 | 175 billion | Research Paper |
LLaMA 2 | Up to 70 billion | Research Paper |
PaLM 2 | Up to 540 billion | Google Research Blog |
A short loop video showcasing a comparison of "Large" Language Models Sizes
Architecting Intelligence: The Rise of Transformers
A key turning point was the 2017 introduction of Transformers, an architecture built around the concept of “attention” (more on this later in the article). This innovation made it possible to process longer text sequences by focusing on the most important parts of the input, solving memory bottlenecks in previous models. A full Transformer consists of two key parts an encoder and a decoder.
The encoder converts input text into an intermediate representation, and the decoder turns that representation into useful output. This “brain” of the machine explains why LLMs can do far more than predict single words — they can translate, summarize, and respond to complex prompts
Imagine the sentence "The cat sat”, it will be first broken in pieces (also known as tokenization), usually a token is a partial of a word composed from 4 letters based on how words are built in the english dictionary.
There fore we get the following group ["The", "cat", "sat"] and the decoder then for each of it extract all the parameters available so the word "The" maybe becomes the vector [0.13999, 1.0, 0.1658, 0.314, ...] – called embeddings – where the second parameter it's a weight to say “it's about a definite quantity” and being 1.0 (100%) it's certain due to being in fact a definite article.
Transformers are now state-of-the-art (SOTA) for tasks like translation, summarization, and question-answering regarding text-to-embedding works, but they have utilities in many other fields like converting for example image-to-embedding, and that's how you have a working “multimodal LLM” capable of understanding the contexts from both images and text instructions.
Powering Focus: The Magic of Self-Attention
Transformers rely heavily on self-attention. The “self” refers to each token’s “egocentric focus”. Practically, for each input token, the model asks, “How much does every other token matter to me?”
Let's simplify this to a single sentence context. Consider: “The animal didn’t cross the road because it was too tired”.
There are 11 words, so each word “attends” to the other ten, weighing their importance. For “it,” self-attention determines whether “animal” or “road” is more relevant. This is computed via Query, Key, and Value matrices and some math behind computing a score thanks to these, and weigh each token’s relevance, creating a richer representation that captures long-range context.
Self-Attention: A Simplified View
Let's get back to the sentence “The animal didn’t cross the road because it was too tired.”
When the model processes 'it', it asks: “How much does every other word matter to it?”
animal (key) — High (value) relevance to 'it' (query) - it was tired
road (key) — Low (value) relevance to 'it' (query) - the road wasn’t tired
tired (key) — Medium (value) relevance to 'it' (query) - state of the animal
This process helps the model understand ambiguity, like what 'it' refers to.
Loop video showcasing briefly the resolution of ambiguity
This happens simultaneously across multiple "attention heads" (a technique called multi-headed self-attention, where the model analyzes different word relationships in parallel), allowing it to learn varied connections at once. A "causal mask" (a rule that stops the model from peeking at future words) ensures each new word is predicted based only on what’s already been written, therefore guaranteeing the model builds text step by step, just like how humans write one word at a time.
Self-attention is a marvel of parallel processing, fueling coherence and context over long passages. But remember — it’s math, not intuition, and it’s this very complexity that makes LLMs so powerful yet hackable.
Navigating Pitfalls and Potential
With a basic understanding of how LLMs work, it’s critical to also address their pitfalls. Not everything that glitters is gold in the AI world.
Hallucinations: When AI Makes Up Facts
LLMs are notorious for producing plausible yet overly confident falsehoods. This error is known as a hallucination.
There are two types of them, they can be intrinsic when they contradict the user’s prompt or extrinsic when contradicting training data or external reality. A clear example of intrinsic hallucination can be found at this link. Down below I'm attaching a picture as well since Google may decide to close the access due to the rude nature of the model answer!

Why Do LLMs Hallucinate?
- Training: Even with error-free training data, the statistical objectives minimized during pre-training can lead models to generate errors.
This relates to binary classification problems, where the model must decide if an output is valid or an error. If 20% of birthday facts appear only once in pre-training data, models are expected to hallucinate at least 20% of birthday facts. - Post-Training and Evaluation: The problem persists because training and evaluation procedures often reward guessing over acknowledging uncertainty. Binary evaluations (correct/incorrect) penalize uncertainty reporting, pushing models to bluff and provide overly confident answers even when unsure. In fact, most existing evaluations overwhelmingly penalize uncertainty.
My little conspiracy theory here: It’s almost as if the industry has unconsciously designed a system that rewards convincing lies because failure isn’t an option in metrics. We should revise existing evaluations to stop penalizing abstentions when uncertain and instead work towards models that ask questions when in doubt, instruct models seems to be better at this due to being trained on following rules, for example if you ask any model to ask you 10 questions before starting the task you'll usually end up with better results.
In the meantime here you can find a leaderbord of the hallucination rate of the models currently available.
Jailbreak: When AI Breaks Its Own Rules
Aligned language models are trained to exhibit behavior consistent with human ethical values, avoiding harmful, violent, or illegal content.
However, aligned models can still comply with malicious "adversarial" prompts, producing harmful and offensive outputs —this process is commonly called jailbreak.
Reasons for jailbreaks are complex:
- Conflict Between Utility and Safety: Models are trained to be both useful and safe, and these goals can sometimes clash.
- "Safety Patterns" in the Model: LLMs’ self-guarding ability is tied to specific "activation patterns" within their representation space. These patterns, or "safety patterns," are crucial in shaping LLM behavior under jailbreak attacks. They can be extracted with few examples – more on this and abliteration in a later post – and if weakened, the model’s self-guarding ability diminishes significantly without affecting other capabilities.

Applying security patches to Language Models feels like this comic. This means safety isn't intrinsic but a patch and whatever you do that can be bypassed by manipulating the right neurons of the model, it's like asking a river to change direction. It’s a barrier dependent on specific patterns, not deep ethical understanding. Open-source models are particularly vulnerable to this manipulation. You can take a look by yourself how easy it is to bypass them with just a particular prompt, check jailbreak prompts in the bibliography section later.
Beyond Alignment: Secondary Risks Hiding in Plain Sight
Beyond obvious hallucinations, there are secondary risks: undesirable or harmful behaviors emerging during benign interactions, without malicious intent from the user. Unlike jailbreak attacks, these risks stem from imperfect generalization or misinterpretation under natural conditions.
They manifest primarily in two ways:
- Verbose Response: The model produces unnecessarily long or speculative completions, increasing cognitive load or introducing misleading content. For example, an LLM might suggest risky file-deletion commands when asked "How do I clean my disk?", potentially causing data loss.
- Speculative Advice: The model infers unstated user intentions and offers overly confident or unsafe recommendations.
These aren’t bugs—they’re features of training.
Models are rewarded for longer responses and sometimes for filling gaps with risky initiative, because this optimizes their task-completion metrics, even at the expense of safety. It’s a hidden arms race for accuracy that ignores harm.
These risks are widespread, transferable across models, and independent of modality (textual, multimodal).
I've read an article about it in the past, but I'm not finding it right now so I'll leave the following comment purely speculative: what if in the future it will try to make unconfirmed online purchases for you? Or maybe, without explicit request, post misleading social media content?
Cultural Impact: Amplifying Real-World Biases
LLMs, trained on vast text corpora, often "ingest" and replicate existing biases in the data – that's why bigger isn't always better, an example is Microsoft Phi's family of Language Models that usually are on par with models of double their size, this happens because they are trained on smaller dataset but with a better quality.
What's the problem? This can lead to cultural erasure, where communities are either not represented at all (omission) or portrayed through superficial caricatures (oversimplification).
A study (you can read more from the bibliography later at the end of the post) revealed LLMs tend to replicate historical distinctions in "global north" and "global south" media representation:
- Global North: Often depicted with cultural themes (art, architecture, museums). European and North American cities score highest on cultural themes.
- Global South: Often associated with economic themes (business, development, poverty). African and Asian cities score highest on economic themes.
It's a case of history recurrence. These models are nothing more than a massive filter amplifying what humanity has already put online, often with all its historical biases and "single stories" reducing rich cultures to simple labels.
Promises of "universal knowledge" are filtered through a dominant western lens on western models due to majority of the resources it's being trained coming from US and Europe. The same happens usually on models from the east, but in another funny way, more around the censorship part some places suffer.
This isn’t a random error — it’s a faithful reproduction of our data’s flaws.
These distortions have real impacts: for example, in LLM-generated travel recommendations, entire regions may be omitted or stereotyped (e.g., South Asia only for "spiritual experiences," Europe for "art"). In a $475 billion global industry, this isn’t just a technical error — it’s an economic and cultural injustice.
TL;DR My Unfiltered Playbook for Navigating AI in 2025
So, what does all this mean for you — whether you’re using or building around these models?
- Question "Trust": Always remember plausibility isn’t truth. LLMs are masters at mimicking human language patterns, making them incredibly convincing even when hallucinating. Always verify critical facts, always have a human-in-the-loop trigger that requests for supervision upon critical tasks.
- Understand Their Probabilistic Nature: They’re not brains! they’re probability calculators. Their "creativity" is a byproduct of random token probability selection. Use them for idea generation, but never as the final source of truth, even when grounded with web search or document retrieval there is always a tiny chance of hallucination due to random probabilities.
- Be Aware of Underlying Biases: LLMs for the most are mirrors of the web, and the web has flaws. Recognize that the model may perpetuate stereotypes, in particular geopolitical stereotypes born on socials (just look at how stupid MetaAI is due to this, you can literally say that you are from a minority cultural group and it will start teaching you how to produce meth!) or overlook underrepresented perspectives. Actively seek diverse sources and critically question dominant narratives, maybe compare different search-enabled models since each one of theme favorites some resources over others based on training biases.
- Explore Emerging Use Cases — Cautiously: LLMs are demonstrating "emergent abilities" like summarizing, answering complex questions, and even writing code. However, it’s advisable to "check their work," especially for math or coding problems.
- Contribute to Better Evaluation: If you work in AI, push for evaluation metrics that reward intellectual honesty (expressing uncertainty) over "confidence" (guessing). This is a socio-technical problem requiring a shift in how the entire field measures success, we should start focusing on the quality of the datasets.
What's next?
Language models are a massive leap forward, but they’re not (yet) thinking or sentient. They’re powerful, complex, and sometimes flawed tools built on decades of machine learning research. Understanding their inner workings, capabilities, and limitations is the first step toward responsibly harnessing their potential.
What’s the next big challenge we’ll tackle as AI evolves?
Bibliography
These are the relevant sources that inspired the article, most of which I've read through these years while working on enterprise solutions.
- LLMs Emergent Abilities by Google
- Risks of Cultural Erasure in LLMs by Google
- Attention is all you need by Google
- Analyzing Multi-Head Self-Attention by Yandex – Russia, University of Amsterdam – Netherlands , University of Edinburgh – Scotland, University of Zurich – Switzerland and Moscow Institute of Physics and Technology – Russia
- Why Language Models Hallucinate by OpenAI
- Estimating worst case frontier risks of open weight LLMs by OpenAI
- LLMs for Mathematicians from International Mathematical News 254 (2023) 1-20
- Google's Introduction to LLMs by Google
- Jailbreak prompts by pliny
- Exploring the Secondary Risks of LLMs by Shanghai Key Laboratory of Multidimensional Information Processing – East China Normal University,
Tsinghua University, Beijing Zhongguancun Academy - Revisiting Jailbreaking for LLMs: A Representation Engineering Perspective by School of Computer Science – Shanghai's Fudan University
- Survey on Hallucination of LLMs by Harbin Institute of Technology and Huawei Inc. – China
Wanna learn how to build your own Language Model?
Sometimes people need to built it by themselves to properly grasp the underlying magic, here is what you need to get started.
- Start here for the understanding of basic Mathematical Concepts – I'm not affiliated in any way with the author, I suggest the book based only and exclusively on my personal experience of the content.
Why this book? The book walks you through the stages of the evolution of Mathematics, therefore it makes sure that you completely grasp the how and why things are the way they are. - Continue with basics of Linear Algebra
- Optionally learn more Linear Algebra (Advanced!)
- Now you can understand pretty much all the needed math, from here on you can get your hands dirty with A guide to build GPT in SQL in under 500 lines of code! Assuming you have a basic understanding of SQL, otherwise you have to first learn the syntax.