ELI5: How Do LLMs Work? Part 1 — What the Hell?

7 minute read

Well, there’s no hiding from it. LLMs are here to take our jobs, and possibly our wives and pets. In an effort to know thy enemy before it destroys us, let’s try to figure out how this stuff works.

In this series of articles from our "ELI5" series, which aims to take a candid but informed look at today’s technologies with no soft, mushy buzzwords, we dive straight into the world of LLMs.

This series will take us through foundational LLM concepts, explore their inner workings, provide further reading resources, and maybe we’ll even dip our toes into existentialism and fear-mongering toward the end, if we have time.

Alright, time’s a-wasting. Let’s get started

Buzzword Breakdown

There are a lot of words out there. So let’s define some key terms before we go any further. These go roughly from most broad to most specific.

Artificial Intelligence (AI): The big umbrella term. AI refers to any system that attempts to mimic human intelligence to perform tasks — whether that’s recognizing speech, playing chess, or generating memes. Not all AI is smart; some of it is just smoke and mirrors. But if it tries to “act human,” it probably falls under this category.
Machine Learning (ML): A subfield within the field of AI. Machine learning is what happens when we stop hard-coding rules and instead let algorithms learn patterns and rules directly from data. It’s less “tell the computer what to do” and more “teach the computer to figure it out.”
Neural Network: A neural network is a system of connected “neurons” (basically just mathematical functions) that pass signals to each other. You feed data into the first layer of the neural network — a layer being a group of neurons that work together to transform input into output — it passes through a bunch of other layers, and you get some kind of output at the end. With enough layers and data, a neural network can produce outputs that recognize incredibly subtle patterns.
Deep Learning: A subset of machine learning. Deep learning uses neural networks to model complex patterns in data. It’s especially good at tasks like image recognition, voice synthesis, and language understanding. It’s called “deep” because the neural networks required for deep learning often have many, many layers.
Natural Language Processing (NLP): A field focused on enabling machines to understand and work with human language. NLP covers everything from translating French to English, to analyzing the tone of a tweet, to auto-completing your email.
Large Language Model (LLM): A specific kind of deep learning model designed for NLP. An LLM is trained on massive amounts of text — books, articles, tweets, Reddit comments, whatever — and learns statistical patterns in language. Once trained, it can generate new text, answer questions, write code, and much more. LLMs are basically what you get when you apply deep learning to huge piles of language data.

What Even Is an LLM, Anyway?

First things first: LLM stands for “Large Language Model.” At its core, it’s just a machine learning model trained on a massive amount of data. These models learn patterns in that data to do things like read, understand, and respond to text, images, audio, or other inputs.

Now, you might have heard of different kinds of LLMs: GPT, BERT, and so on. They’re all part of the same family, but they work a bit differently. For example, GPT is a type of LLM built mainly for generating text by predicting the next word in a sequence, one after another. BERT, on the other hand, is designed more for understanding text, which helps with things like search and classification. We will get into more details on these LLMs in subsequent articles.

Despite those differences, they’re all basically just pattern-recognition machines trained on huge piles of data. They don’t know whether something is right or wrong, and they definitely don’t possess “consciousness” unless you consider a giant matrix of numbers to be conscious. If you’re feeling philosophical, maybe that isn’t all that different from us humans, if you really think about it.

Further reading suggestion: You Look Like a Thing and I Love You by Janelle Shane.

A Brief History of LLMs

LLMs certainly feel like they came out of nowhere. One minute I’m sifting through StackOverflow threads, and the next I’m pasting a stack trace into an AI-powered IDE and saying "plz fix" like I’m talking to an intern polishing a PowerPoint. But these models didn’t magically appear — they’re the result of decades of research, and hardware that finally caught up to the theory.

AI research, as we know it, began in the 1950s with early systems that tried to mimic human reasoning using logic and rules — think theorem solvers and chess programs, like the ones discussed by these guys. These early projects were ambitious but brittle, and when they failed to deliver widespread success, funding dried up. This period of stalled progress in the world of AI became known as the AI Winter.

There were intermittent AI successes here and there — important milestones that pushed the field forward — but they weren’t LLMs. No deep learning or anything like that. Like in 2011, when IBM Watson won Jeopardy! against Ken Jennings and Brad Rutter, using a combination of keyword matching, database lookups, and early NLP techniques.

The real game-changer came in 2017, when Google researchers published the paper, Attention Is All You Need, introducing the transformer architecture. This architecture allowed models to process all inputs to a model at once using a mechanism called "self-attention". It was fast, parallelizable, and shockingly good at understanding language structure. This solved key limitations with earlier models based on recurrent neural networks (RNNs), which processed text sequentially and often struggled to capture long-range dependencies or parallelize efficiently.

The transformer architecture inspired a flood of transformer-based LLMs. In 2015, OpenAI was founded with the mission of building safe, widely beneficial artificial general intelligence, releasing several models — including the popular ChatGPT, which is just an interface to the company’s underlying LLMs (like GPT-3.5 and GPT-4). Since then, everyone’s jumped in. Meta released LLaMA, Anthropic built Claude, Google launched Gemini, and Hugging Face turned models into downloadable APIs. LLMs stopped being a research novelty and became everyday tools — from coding assistants to customer support bots and everything in between.

LLMs didn’t come out of nowhere. But now that they’re here, they’re everywhere.

Conclusion

I expect this article might have left you with more questions than answers. What is ChatGPT? What’s a GPT, anyway? Who — or what — the hell is BERT? Yeah... we’ll get to all of those in good time.

But here’s the big picture: LLMs aren’t magic. They’re basically just extremely sophisticated pattern-recognition machines powered by math, statistics, and a ton of training data. They work by processing inputs — streams of text, audio, pictures — and producing outputs that reflect what they’ve seen before.

In Part 2, we’ll start getting more hands-on. We’ll dig into some foundational concepts like tokenization and embeddings — the raw ingredients that feed into a model’s brain — before we dive into the wild world of transformers, attention, and how these models actually learn.

Strap in. It's only going to be a wild and nerdy ride.