Crawl Testing for AI Visibility
See your site the way AI crawlers do. Testing helps identify content visibility issues, rendering problems, and blocked resources before they affect AI indexing.
In this guide
- Testing with crawler user agents
- Viewing pages without JavaScript
- Identifying blocked resources
- Automated crawl testing tools
Why Test Crawl Visibility?
What you see in a browser may differ significantly from what AI crawlers see:
JavaScript Rendering
Content rendered via JavaScript may be invisible to crawlers with limited JS support.
User Agent Detection
Your site might serve different content to bots vs browsers.
Blocked Resources
CSS or JS files blocked in robots.txt can prevent proper rendering.
Testing with curl
The simplest way to see raw crawler output is using curl with different user agents:
# Test as GPTBot
curl -A "GPTBot/1.0" https://yoursite.com/page
# Test as ClaudeBot
curl -A "ClaudeBot/1.0" https://yoursite.com/page
# Test as Googlebot
curl -A "Googlebot/2.1" https://yoursite.com/page
# Save output for analysis
curl -A "GPTBot/1.0" https://yoursite.com/page > gptbot-view.html
# Check response headers
curl -I -A "GPTBot/1.0" https://yoursite.com/page AI Crawler User Agents
Use these user agent strings for testing:
| Crawler | User Agent String |
|---|---|
| GPTBot | GPTBot/1.0 |
| ClaudeBot | ClaudeBot/1.0 |
| Google-Extended | Google-Extended |
| PerplexityBot | PerplexityBot |
| CCBot | CCBot/2.0 |
Browser-Based Testing
Disable JavaScript
See what crawlers without JS rendering see:
- • Chrome: DevTools → Settings → Debugger → Disable JavaScript
- • Firefox: about:config → javascript.enabled → false
- • Or use a browser extension like "Quick JavaScript Switcher"
View Source vs Inspect
View Source shows raw HTML (what crawlers fetch). Inspect Element shows the rendered DOM (what users see). Compare both to understand JS-rendered content.
User Agent Switching
Use DevTools Network Conditions to set custom user agents, or install an extension like "User-Agent Switcher" to test different crawler identities.
Google Search Console
While focused on Googlebot, Search Console's tools reveal rendering issues that affect all crawlers:
URL Inspection Tool
Enter any URL to see how Google renders it. Shows rendered HTML, screenshot, and any blocked resources.
Coverage Report
Identifies pages with crawl errors, blocked by robots.txt, or excluded from indexing.
Automated Testing Script
Create a simple script to test multiple pages:
#!/bin/bash
# crawl-test.sh - Test pages with AI crawler user agents
URLS=(
"https://yoursite.com/"
"https://yoursite.com/about"
"https://yoursite.com/products"
"https://yoursite.com/pricing"
)
AGENTS=(
"GPTBot/1.0"
"ClaudeBot/1.0"
"Googlebot/2.1"
)
for url in "${URLS[@]}"; do
echo "Testing: $url"
for agent in "${AGENTS[@]}"; do
status=$(curl -s -o /dev/null -w "%{http_code}" -A "$agent" "$url")
echo " $agent: $status"
done
echo ""
done What to Check
Crawl Testing Checklist
- □ Main content visible in raw HTML (View Source)
- □ No cloaking (same content for bots and users)
- □ 200 status for all important pages
- □ Correct canonical URLs
- □ No blocked CSS/JS needed for rendering
- □ Meta robots allows indexing
- □ Structured data present
Common Issues to Find
Empty content area
View Source shows <div id="content"></div> because content only loads via JS.
Bot-specific redirects
Bots redirected to different pages than users. This is a cloaking red flag.
Login walls
Content behind authentication is invisible to all crawlers.
Key Takeaway
Test like a crawler, think like a crawler.
Regular crawl testing reveals issues before they impact AI visibility. Use curl, disable JavaScript, and check View Source to see what AI crawlers actually see.
Sources
- Google Search Console: URL inspection tool for testing crawler views
- Overview of Google Crawlers: Understanding different bot user agents