Point11
Discoverability

How Gemini Crawlers Work

Google runs two crawler families: Googlebot for search and Google-Extended for Gemini. Blocking the wrong one can silently remove you from AI answers.

Your robots.txt file may be hiding your site from Gemini, and you would never know it from your search rankings. Google runs two distinct crawler families, and a single misplaced directive will quietly cut you off from one without affecting the other.

Two Crawler Families, One Domain

Google operates a family of crawlers[1]:

  • Googlebot indexes pages for Google Search
  • Google-Extended handles AI training and Gemini real-time answers[2]
  • AdsBot evaluates landing page quality for Google Ads
  • APIs-Google serves Google APIs and internal products

Google-Extended was introduced in September 2023 as a dedicated token for AI training access, independent of search indexing.

How They Differ

Googlebot crawls for search relevance, while Google-Extended crawls for comprehension. It trains Gemini's models and powers AI Overviews. The key distinction is that they respond to different robots.txt directives. A rule targeting Googlebot does not apply to Google-Extended, and vice versa.

How to Control Access

Block only Google-Extended (keep search, opt out of AI): ``` User-agent: Google-Extended Disallow: / ```

Block all (removes from both search and AI): ``` User-agent: * Disallow: / ```

Common Mistakes

  • Accidentally blocking Google-Extended via a catch-all `User-agent: *` rule without realizing it
  • Assuming that blocking Google-Extended only affects training, when it also affects Gemini's real-time answers
  • Using `noindex` thinking it only affects search (it affects all Google crawlers)

How Site Scanner Helps

Site Scanner audits your robots.txt for unintentional Google-Extended blocks. See also How Agent Crawlers Work.

See how your site scores.

Run a free scan at point11.ai to check your How Gemini Crawlers Work and 40+ other metrics.

Scan Your Site