Your robots.txt file may be hiding your site from Gemini, and you would never know it from your search rankings. Google runs two distinct crawler families, and a single misplaced directive will quietly cut you off from one without affecting the other.
Two Crawler Families, One Domain
Google operates a family of crawlers[1]:
- Googlebot indexes pages for Google Search
- Google-Extended handles AI training and Gemini real-time answers[2]
- AdsBot evaluates landing page quality for Google Ads
- APIs-Google serves Google APIs and internal products
Google-Extended was introduced in September 2023 as a dedicated token for AI training access, independent of search indexing.
How They Differ
Googlebot crawls for search relevance, while Google-Extended crawls for comprehension. It trains Gemini's models and powers AI Overviews. The key distinction is that they respond to different robots.txt directives. A rule targeting Googlebot does not apply to Google-Extended, and vice versa.
How to Control Access
Block only Google-Extended (keep search, opt out of AI): ``` User-agent: Google-Extended Disallow: / ```
Block all (removes from both search and AI): ``` User-agent: * Disallow: / ```
Common Mistakes
- Accidentally blocking Google-Extended via a catch-all `User-agent: *` rule without realizing it
- Assuming that blocking Google-Extended only affects training, when it also affects Gemini's real-time answers
- Using `noindex` thinking it only affects search (it affects all Google crawlers)
How Site Scanner Helps
Site Scanner audits your robots.txt for unintentional Google-Extended blocks. See also How Agent Crawlers Work.