Key Concepts Glossary

The Foundations pillar ends here. This is the reference you come back to — the fourteen words the AI industry keeps using without explaining, each one defined in plain English by someone who actually uses this stuff for work. Not a textbook glossary. An operator's glossary. When you close this page, you own the vocabulary, and every article or TikTok you read afterwards stops sounding like a foreign language.

Pillar 01 · Foundations Module 5 of 5 · Capstone ~16 min read

Token

A token is the smallest chunk of text a model reads or writes. Roughly four characters or three-quarters of a word. “Cat” is one token, “hippopotamus” is two or three, “don't” is two because the apostrophe breaks it. The model doesn't read characters and it doesn't read words exactly — it reads tokens, which are the middle ground.

Why you care: everything a tool charges you for is counted in tokens. Every context window is measured in tokens. When someone says “a 200K context model,” they mean 200,000 tokens, which is roughly 150,000 words. When someone says a task “used 10 million tokens,” they mean you wrote a lot. Tokens are the unit of both cost and memory.

Context

Context is everything the model can “see” when it's deciding its next word: your current message, every previous message in the conversation, the system prompt, any files you've pasted or uploaded, and the response it's in the middle of generating. All of that lives inside the context window, which is the maximum number of tokens the model can hold at once.

Why you care: anything outside the context window does not exist to the model. If you told it something ten thousand words ago and you're now past its window, the thing you told it is gone. It didn't “forget” — it never had access. This is why long chat conversations start feeling vague: the early stuff scrolled off the top. Read Module 2 for the longer explanation.

Prompting

Prompting is the practice of writing instructions for a language model in a way that gets good output. It's a real skill, but it's been dramatically overhyped. You don't need a 40-page “prompt engineering course.” You need three habits:

Describe the outcome, not the implementation. “Rewrite this so a busy executive understands it in thirty seconds” is better than “use simple words, make it shorter.”
Give context. Tell the model who you are, who the reader is, and what this is for. The same question with no context produces generic output. With context, it produces targeted output.
Show, don't explain. If the output shape matters, paste an example of the shape you want. Three sentences of example beats ten sentences of description.

That's 80% of it. The remaining 20% is knowing which model you're talking to and what its quirks are.

System Prompt

A system prompt is an instruction the tool inserts at the very beginning of every conversation, before your first message, that tells the model who it is and how to behave. It's the “you are a helpful assistant who...” line you've probably seen in tutorials.

Most consumer chat apps hide their system prompts. If you're using ChatGPT or Claude.ai, there's a system prompt in there you never see — defining tone, safety rules, formatting conventions. When you build something with the API, the system prompt is yours to write. In Claude Code the equivalent is your CLAUDE.md file, which gets injected at the start of every session.

Why you care: the system prompt is where you encode standing rules. “Always respond in plain English.” “Never invent sources.” “Match this voice guide.” Rules you set in a system prompt are stickier than rules you set in a single message.

Temperature

A number between 0 and 1 (or sometimes higher) that controls how much the model samples from its top guesses versus the less-likely ones when picking the next token. Low temperature (0 to 0.3) produces predictable, repeatable output. High temperature (0.7 and up) produces varied, sometimes surprising output.

Common mistake: thinking temperature is an accuracy dial. It is not. A low-temperature model is just as capable of hallucinating as a high-temperature one — it'll just hallucinate the same plausible wrong answer every time. Temperature controls variety, not truth. Module 2 has the longer version.

Inference

The moment of actually running a trained model to produce output. When you hit Enter and watch the response stream back token by token, that's inference. The counterpart to inference is training, which is the multi-month process of teaching the model in the first place.

Why you care: inference is what you pay for. Training is done by the labs; inference is done every time you or your tool sends a request. The API pricing you see in tool comparisons is all inference pricing, usually quoted as dollars per million tokens in and out.

Hallucination

When a model generates text that sounds authoritative, is delivered with confidence, and is factually wrong. The fake citation. The invented URL. The quote nobody ever said. Hallucinations are not a bug — they're a direct consequence of how language models generate plausible text whether or not they have the underlying facts.

Module 4 is the full field guide to spotting and preventing hallucination in your own work. The one-sentence version: if a model's output contains a name, a number, a date, a quote, or a citation, verify it against a real source before you use it.

Fine-Tuning

Taking a model that's already been pre-trained on the internet and then training it further on a specific dataset to make it better at a specific task. Fine-tuning is how a raw, feral pre-trained model becomes a helpful chatbot — it's the second phase of training where humans show the model examples of good behavior.

Businesses sometimes fine-tune models on their own data — customer emails, support tickets, internal documentation — to make the model better at their specific use case. This used to be essential; in 2026 it's less common because good prompting plus a strong base model usually gets close enough for most tasks. Fine-tuning is worth it when you have a very specific output format, a very narrow domain, or a large volume of proprietary examples that wouldn't fit in a prompt.

Embeddings

An embedding is a way of turning a piece of text into a list of numbers that represents its meaning. Two pieces of text that mean similar things will have similar lists of numbers; two pieces of text that mean different things will have dissimilar lists. You can compare embeddings mathematically to find out how similar two texts are — which turns out to be the foundation of most “AI search” features.

Why you care: any tool that advertises “semantic search” or “AI-powered search” is almost certainly using embeddings. The tool computes an embedding for every document it knows about, then computes an embedding for your search query, then finds documents with similar embeddings. It's how AI search can find a document that contains the idea you asked about without containing your exact keywords.

RAG

Short for Retrieval-Augmented Generation. A pattern where, instead of relying only on what the model was trained on, the system first retrieves relevant documents from a database and then feeds them to the model alongside your question. The model's answer is then grounded in the retrieved documents rather than in the frozen training data.

RAG is how most corporate AI chatbots work. "Ask your company's internal docs" usually means: when you ask a question, the system finds the relevant internal documents (using embeddings — see above), stuffs them into the context window, and asks the model to answer based on them. The model is playing librarian, not oracle.

Why you care: RAG dramatically reduces (but does not eliminate) hallucination, because the answer is grounded in real documents the system can show you. When you evaluate a tool that claims to work with your company's data, ask whether it's using RAG, whether it can cite the source documents it used for each answer, and what happens when the relevant document isn't found. Those three questions tell you whether you're looking at a real RAG product or a thin wrapper that's going to hallucinate under pressure.

Agents

An agent is a language model running in a loop, with tools it can use, that can take actions in the world without a human approving each one. The distinction from a chatbot is important: a chatbot answers a question, you read the answer, you decide what to do. An agent does things — reads files, calls APIs, writes to databases, sends emails — and you only review the outcome.

Claude Code is an agent. You tell it what you want and it reads files, edits files, runs commands, and commits changes without you approving each individual step. Autonomous research agents, coding agents, and “browser agents” that can navigate websites on your behalf are all the same idea.

Why you care: agents are where the current edge of practical AI usage lives. They're much more useful than chatbots for any task that requires more than a single step — which turns out to be most real work. They're also where the risks get real, because an agent can take actions that a chatbot cannot. Always supervise an agent the first few times you run it on a new kind of task.

Tool Use / Function Calling

Tool use (also called function calling) is the mechanism that lets a language model invoke external functions during a conversation — the feature that turns a chatbot into an agent. You define a set of tools ("the model can search the web," "the model can read this file," "the model can call this API"), and the model decides, during a conversation, when to call each one.

When you see a model “using its calculator” or “browsing the web” or “reading a file,” that's tool use. The model isn't computing the math itself — it's calling out to a real calculator. It isn't remembering the file contents — it's calling out to a file reader. The text you see is the result of the tool call being stitched back into the conversation.

Why you care: tool use is how all the “AI that takes actions” products actually work under the hood. Understanding this makes the news about “AI agents” much less mysterious — it's all the same pattern, variations on which tools are available.

MCP

Short for Model Context Protocol. An open standard, originally introduced by Anthropic in 2024, for how AI agents connect to external tools and data sources. Think of MCP as a universal plug: instead of every tool needing a custom integration with every AI product, the tool exposes an MCP server and any MCP-aware agent can connect to it. It's roughly analogous to how USB made it possible to plug any device into any computer.

In 2026, MCP is widely adopted across the industry — Claude Code, ChatGPT, and other agentic tools support it. If you see an AI tool advertising an MCP connector for some service (Obsidian, GitHub, Slack, Gmail, whatever), that means the agent can read from and write to that service using the MCP standard.

Why you care: MCP is the reason the agent ecosystem is suddenly growing so fast. It lets the agent you're using reach into new systems without waiting for a custom integration. You don't need to understand how MCP works under the hood — but when you see the acronym, it means “this tool knows how to talk to other tools through a shared standard,” and that's a meaningful capability signal.

Hooks

A hook is a script that runs automatically at a specific event during an agent session. In Claude Code, hooks can fire at session start, at session end, or after every tool use — letting you inject context, save state, or enforce rules without writing a prompt for each event.

The session-start hook that reads your Obsidian vault and injects your context into every new Claude Code session is one of the most powerful examples. The session-end hook that automatically writes a daily journal entry is another. Hooks are what turn a raw agent into a personalized agent that fits your actual workflow.

Why you care: if you adopt an agent tool and you want it to be your agent — not the generic version — hooks are the customization layer. The second-brain workflow post in the Get a quote walks through the exact hooks I use.

Capstone Exercise: Your First Agentic Prompt

You've now read the whole Foundations pillar. You know what a language model is. You know how one works. You know the current map of who builds them and who to reach for. You know the five ways they fail. And you know the vocabulary. One more thing before you go: a hands-on exercise that puts it all together. Fifteen minutes. You'll feel the difference.

Here's the exercise. Pick one of the frontier tools you have access to (Claude.ai, ChatGPT, Gemini, Copilot — any of them). Open a fresh conversation. Paste the following system-prompt-style instructions as your first message:

I'm going to ask you to help me with something real from my work. Before you answer, I want you to follow these rules:

If I ask for numbers, dates, names, or citations, tell me explicitly which ones you're confident about and which you would need to verify.

If my question is based on an assumption you think is wrong, push back before answering. Don't just validate me.

When you're done, summarize three things: what you're confident in, what you're uncertain about, and what I should verify before I act on your answer.

Ready? Here's the thing I need help with: [describe a real task from your work].

Do it with a real task — not a made-up one. Something you were going to work on this week anyway. A presentation you're drafting, a decision you're weighing, an analysis you need to run.

When the model responds, notice the shape of the answer. It should feel different from a normal chat. It should feel like the model is thinking about its own uncertainty, not just producing output. The three-part summary at the end — confident, uncertain, verify — is the key. That summary is the difference between AI as a word generator and AI as a useful collaborator. It's the workflow I run on anything that matters.

That's the capstone. Five modules, fourteen words, one habit: treat the model as an overconfident intern with an encyclopedic memory, and make it tell you what it's guessing about. Now go do real work with it.

Sources cited in this module

Introducing the Model Context Protocol — Anthropic · First-party launch post for MCP — the canonical explanation of what it is and why it matters.
Effective Context Engineering for AI Agents — Anthropic Applied AI · The authoritative piece on how modern agents use context and why naive “just stuff everything in” doesn't work.
OpenAI Function Calling Guide — OpenAI · The clearest walkthrough of tool use / function calling in the GPT family.
What is Retrieval-Augmented Generation? — Pinecone · A vendor-neutral introduction to RAG, written for non-ML audiences.
Hugging Face Glossary — Hugging Face · A broader technical glossary for anyone who wants to go deeper on any term here.

Want the full AIOS blueprints?

All our guides — Foundations, Tools, and Workflows — are free to read. We help teams install these systems end-to-end.

Get a quote ↗