AI Bot Accessibility
Whether major AI crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot — can reach a site. The highest-priority GEO signal.
What is AI Bot Accessibility?
AI Bot Accessibility describes whether the major AI crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, and others — can read a site's content and consider it for citation. Domains that block these bots are excluded from training data and real-time citation candidates, which makes AI Bot Accessibility the single highest-priority signal in GEO and AEO measurement.
It is governed by two files: robots.txt (allow/disallow policy) and llms.txt (a content-guidance standard). Auditing both is the core of an AI bot accessibility check.
Eight major AI bots
| Bot | Operator | Purpose |
|---|---|---|
| GPTBot | OpenAI | Training-data collection for ChatGPT |
| ChatGPT-User | OpenAI | Real-time fetcher invoked by ChatGPT Search / Plugins |
| ClaudeBot | Anthropic | Training and citation data for Claude |
| Google-Extended | Opt-out token for Gemini and AI Overviews training | |
| PerplexityBot | Perplexity | Citations for Perplexity answers |
| Bytespider | ByteDance | Training for ByteDance LLMs (e.g., Doubao) |
| CCBot | Common Crawl | Shared crawl data used by many LLMs |
| Applebot-Extended | Apple | Opt-out token for Apple Intelligence training |
Each bot can be controlled independently via User-agent: <bot-name> in robots.txt.
Common mistakes
- Blocking everything with
User-agent: *. Unintentionally blocks all AI bots. - Edge / firewall blocks. robots.txt allows the bot, but Cloudflare or AWS WAF blocks it at the edge.
- Assuming
Google-Extendedallow is enough. It is just the Gemini opt-out token, separate from Googlebot. - Checking only
Disallow: /and missing partial paths. Partial blocks likeDisallow: /blog/still hurt visibility.
What a GEO analysis tool checks
- Whether
robots.txtexists and returns 200 (vs 404) - Allow/disallow state for 8–12 major AI bots
- Presence and shape of an
llms.txtfile - Real bot responses through CDN / firewall (live test call)
- If blocked, whether it is intentional (matches site terms / privacy policy)