AI & AIO Glossary

Key terminology for understanding AI systems and AI optimisation. Reference this glossary as you work through the learning tracks.

Search terms...

Search functionality coming soon

A

AIO (AI Optimisation)

The practice of optimising content and technical infrastructure to improve brand visibility in AI assistant responses. Similar to SEO but focused on how AI systems understand and represent your brand.

AI Overviews

Google's AI-generated summaries that appear at the top of search results. Powered by Gemini and formerly known as SGE (Search Generative Experience).

C

Context Window

The maximum amount of text (measured in tokens) that an LLM can process at once. GPT-4 has a 128K token context window (~96,000 words), while Claude supports up to 200K tokens.

Crawl Budget

The number of pages a crawler will index from your site within a given timeframe. Managing crawl budget ensures AI crawlers access your most important content.

E

E-E-A-T

Experience, Expertise, Authoritativeness, and Trustworthiness. Google's framework for evaluating content quality, which also influences how AI systems perceive brand authority.

Entity

A distinct, identifiable thing (person, place, organization, product) that AI systems can recognize and associate with attributes. Proper entity definition helps AI understand your brand.

F

Fine-tuning

The process of further training a pre-trained LLM on specific data to improve its performance on particular tasks or domains.

G

Grounding

The technique of connecting AI responses to verified external data sources, reducing hallucinations by anchoring outputs in real information.

GPTBot

OpenAI's web crawler that collects data for training GPT models. Can be controlled via robots.txt.

H

Hallucination

When an AI generates confident-sounding information that is factually incorrect or fabricated. Common with topics the model has limited training data on.

J

JSON-LD

JavaScript Object Notation for Linked Data. The preferred format for implementing Schema.org structured data, helping AI understand entities and relationships on your pages.

K

Knowledge Cutoff

The date when an AI model's training data ends. Events or information after this date aren't in the model's base knowledge (unless retrieved via web search).

Knowledge Graph

A database of entities and their relationships. Google's Knowledge Graph powers many search features and influences how Gemini understands entities.

L

LLM (Large Language Model)

A type of AI trained on massive text datasets to understand and generate human language. Examples include GPT-4, Claude, Gemini, and Llama.

llms.txt

An emerging standard (similar to robots.txt) that provides instructions to AI systems about how to interact with your content. Still in early adoption.

P

Parameters

The learned weights in a neural network. "GPT-4 has over 1 trillion parameters" refers to the number of these learned connections. More parameters generally means more capability.

Parametric Knowledge

Information encoded in an LLM's weights during training, essentially its "memory." Contrasts with retrieved knowledge fetched in real-time.

Prompt

The input text given to an AI model. The quality and structure of prompts significantly affects the quality of AI responses.

R

RAG (Retrieval-Augmented Generation)

A technique where an AI retrieves relevant documents before generating a response, combining the model's knowledge with current external information. Used by Perplexity and ChatGPT with browsing.

RLHF (Reinforcement Learning from Human Feedback)

A training technique where human raters evaluate AI outputs, and the model learns to produce responses humans prefer. Key to making AI assistants helpful and safe.

S

Schema.org

A collaborative vocabulary for structured data markup. Using Schema.org helps AI systems understand the entities, relationships, and content on your pages.

Semantic HTML

HTML that uses meaningful tags (like <article>, <nav>, <header>) to describe content structure, making it easier for AI to understand page organization.

SSR (Server-Side Rendering)

Rendering web pages on the server before sending to the browser. Important for AI crawlers that don't execute JavaScript.

T

Token

The basic unit of text that LLMs process. Roughly 4 characters or 3/4 of a word in English. "ChatGPT" might be 2-3 tokens.

Training Data

The text corpus used to train an LLM. Includes web pages, books, articles, code, and other text sources. Quality and representation in training data affects AI visibility.

Transformer

The neural network architecture underlying modern LLMs. Introduced in 2017's "Attention Is All You Need" paper, it enables efficient processing of sequential data.

U

User Agent

A string that identifies a web crawler or browser. AI crawlers use specific user agents (like GPTBot, ClaudeBot) that can be targeted in robots.txt.

Z

Zero-shot Learning

An AI's ability to perform tasks it wasn't explicitly trained on. LLMs can often answer questions about topics with minimal specific training data.

Back to Learn Hub