OpenAI doesn't have one crawler. It has three, and they do fundamentally different things. Block the wrong one and you cut off ChatGPT from your content entirely.
The Three Crawlers
OpenAI operates three crawlers, each with a separate job[1]:
- GPTBot collects training data. It has no connection to live conversations.
- ChatGPT-User browses your site in real time when a user asks ChatGPT to look something up.
- OAI-SearchBot indexes content for ChatGPT Search results, the answer panel that competes with Google's featured snippets.
robots.txt Control
Each crawler has its own user-agent string, so you can block them independently:
``` User-agent: GPTBot Disallow: /
User-agent: ChatGPT-User Allow: /
User-agent: OAI-SearchBot Allow: / ```
This example opts out of training while keeping your site visible in live conversations and search results. Adjust the directives to match your own policy.
Common Mistakes
- Blocking GPTBot and assuming it prevents all ChatGPT access. GPTBot only collects training data. ChatGPT-User is the crawler that browses during live conversations.
- Blocking ChatGPT-User on pages you want recommended. If a user asks ChatGPT about your product and ChatGPT-User cannot fetch the page, you get nothing.
- Using a catch-all `User-agent: *` with `Disallow: /`. This blocks all three crawlers plus every other bot on the internet.
How Site Scanner Helps
Site Scanner checks your robots.txt for misconfigurations affecting each of OpenAI's three crawlers. See also How Agent Crawlers Work.