Skip to content

ChatGPT / GPT-4

ChatGPT is the most widely used AI assistant, with over 400 million weekly active users. Understanding how it retrieves information is essential for any AI optimisation strategy.

In this guide

  • GPT-4 variants and their differences
  • When ChatGPT uses web browsing
  • Training data sources and cutoffs
  • Optimisation strategies for ChatGPT
8 min read Prerequisite: Model Comparison
Jun 2024

GPT-4o Cutoff

128K

Context Window

Yes

Web Search

400M+

Weekly Users

Understanding the GPT-4 Family

OpenAI offers several GPT-4 variants, each with different capabilities:

Most Common

GPT-4o ("omni")

The default model for ChatGPT Plus users. Handles text, images, and audio. Training cutoff: June 2024.

Web browsing: Available, triggered automatically or on request

GPT-4 Turbo

Newer training data (April 2024) but not the default. Better for recent information when browsing is off.

Web browsing: Available via API

GPT-4o-mini

Faster, cheaper variant. Often used in free tier and high-volume applications.

Web browsing: Limited in free tier

When Does ChatGPT Search the Web?

ChatGPT doesn't always use web search. Understanding when it does is crucial for your strategy:

Likely to Search

  • Questions about recent events
  • Current pricing or availability
  • Queries with dates ("2024", "latest")
  • User explicitly requests search
  • Unknown entities or niche topics

Likely Uses Training Data

  • General knowledge questions
  • Well-known brands and products
  • Historical information
  • How-to and educational content
  • Coding and technical help

Key Takeaway

ChatGPT uses a hybrid approach.

You need to optimise for both scenarios: build presence in authoritative sources for training data inclusion, AND maintain strong SEO for when it searches. The model decides which approach to use based on the query.

What Training Data Includes

OpenAI has disclosed that GPT-4's training data includes:

Optimisation Strategies for ChatGPT

1. For Training Data Inclusion

  • • Get featured in major publications (TechCrunch, Forbes, industry publications)
  • • Maintain accurate Wikipedia presence if notable
  • • Publish on high-authority domains that allow GPTBot crawling
  • • Create comprehensive, factual content about your brand

2. For Web Search Retrieval

  • • Maintain strong Google SEO (ChatGPT uses Google for browsing)
  • • Optimise for featured snippets and direct answers
  • • Keep content fresh with clear update dates
  • • Structure content with clear headings and FAQ sections

3. Technical Considerations

  • • Allow GPTBot in robots.txt to enable training data crawling
  • • Use schema markup for entity disambiguation
  • • Ensure fast page load times for search retrieval
  • • Make key information accessible without JavaScript

Technical Implementation

OpenAI's GPTBot crawler can be controlled via robots.txt. Learn how to configure it to allow training data crawling while protecting sensitive content.

GPTBot Configuration

Common ChatGPT Issues

Outdated Information

ChatGPT may cite old pricing, discontinued products, or outdated company descriptions from its June 2024 training data.

Competitor Confusion

If your brand name is similar to others, ChatGPT may conflate information. Use distinctive brand language consistently.

Missing from Responses

If ChatGPT doesn't mention your brand, it likely lacks sufficient training data. Focus on building authoritative content presence.

Sources