Crawl Testing for AI Visibility

See your site the way AI crawlers do. Testing helps identify content visibility issues, rendering problems, and blocked resources before they affect AI indexing.

In this guide

Testing with crawler user agents
Viewing pages without JavaScript
Identifying blocked resources
Automated crawl testing tools

12 min read Prerequisite: llms.txt

Why Test Crawl Visibility?

What you see in a browser may differ significantly from what AI crawlers see:

JavaScript Rendering

Content rendered via JavaScript may be invisible to crawlers with limited JS support.

User Agent Detection

Your site might serve different content to bots vs browsers.

Blocked Resources

CSS or JS files blocked in robots.txt can prevent proper rendering.

Testing with curl

The simplest way to see raw crawler output is using curl with different user agents:

# Test as GPTBot
curl -A "GPTBot/1.0" https://yoursite.com/page

# Test as ClaudeBot
curl -A "ClaudeBot/1.0" https://yoursite.com/page

# Test as Googlebot
curl -A "Googlebot/2.1" https://yoursite.com/page

# Save output for analysis
curl -A "GPTBot/1.0" https://yoursite.com/page > gptbot-view.html

# Check response headers
curl -I -A "GPTBot/1.0" https://yoursite.com/page

AI Crawler User Agents

Use these user agent strings for testing:

Crawler	User Agent String
GPTBot	GPTBot/1.0
ClaudeBot	ClaudeBot/1.0
Google-Extended	Google-Extended
PerplexityBot	PerplexityBot
CCBot	CCBot/2.0

Browser-Based Testing

Disable JavaScript

See what crawlers without JS rendering see:

• Chrome: DevTools → Settings → Debugger → Disable JavaScript
• Firefox: about:config → javascript.enabled → false
• Or use a browser extension like "Quick JavaScript Switcher"

View Source vs Inspect

View Source shows raw HTML (what crawlers fetch). Inspect Element shows the rendered DOM (what users see). Compare both to understand JS-rendered content.

User Agent Switching

Use DevTools Network Conditions to set custom user agents, or install an extension like "User-Agent Switcher" to test different crawler identities.

Google Search Console

While focused on Googlebot, Search Console's tools reveal rendering issues that affect all crawlers:

URL Inspection Tool

Enter any URL to see how Google renders it. Shows rendered HTML, screenshot, and any blocked resources.

Coverage Report

Identifies pages with crawl errors, blocked by robots.txt, or excluded from indexing.

Automated Testing Script

Create a simple script to test multiple pages:

#!/bin/bash
# crawl-test.sh - Test pages with AI crawler user agents

URLS=(
  "https://yoursite.com/"
  "https://yoursite.com/about"
  "https://yoursite.com/products"
  "https://yoursite.com/pricing"
)

AGENTS=(
  "GPTBot/1.0"
  "ClaudeBot/1.0"
  "Googlebot/2.1"
)

for url in "${URLS[@]}"; do
  echo "Testing: $url"
  for agent in "${AGENTS[@]}"; do
    status=$(curl -s -o /dev/null -w "%{http_code}" -A "$agent" "$url")
    echo "  $agent: $status"
  done
  echo ""
done

What to Check

Crawl Testing Checklist

□ Main content visible in raw HTML (View Source)
□ No cloaking (same content for bots and users)
□ 200 status for all important pages
□ Correct canonical URLs
□ No blocked CSS/JS needed for rendering
□ Meta robots allows indexing
□ Structured data present

Common Issues to Find

✗

Empty content area

View Source shows <div id="content"></div> because content only loads via JS.

✗

Bot-specific redirects

Bots redirected to different pages than users. This is a cloaking red flag.

✗

Login walls

Content behind authentication is invisible to all crawlers.

Key Takeaway

Test like a crawler, think like a crawler.

Regular crawl testing reveals issues before they impact AI visibility. Use curl, disable JavaScript, and check View Source to see what AI crawlers actually see.

Sources

Google Search Console: URL inspection tool for testing crawler views
Overview of Google Crawlers: Understanding different bot user agents