How ChatGPT Crawlers Work

OpenAI operates three crawlers: GPTBot, ChatGPT-User, and OAI-SearchBot. Understanding each one lets you control what ChatGPT can learn, browse, and surface.

OpenAI doesn't have one crawler. It has three, and they do fundamentally different things. Block the wrong one and you cut off ChatGPT from your content entirely.

The Three Crawlers

OpenAI operates three crawlers, each with a separate job^[1]:

GPTBot collects training data. It has no connection to live conversations.
ChatGPT-User browses your site in real time when a user asks ChatGPT to look something up.
OAI-SearchBot indexes content for ChatGPT Search results, the answer panel that competes with Google's featured snippets.

robots.txt Control

Each crawler has its own user-agent string, so you can block them independently:

``` User-agent: GPTBot Disallow: /

User-agent: ChatGPT-User Allow: /

User-agent: OAI-SearchBot Allow: / ```

This example opts out of training while keeping your site visible in live conversations and search results. Adjust the directives to match your own policy.

Common Mistakes

Blocking GPTBot and assuming it prevents all ChatGPT access. GPTBot only collects training data. ChatGPT-User is the crawler that browses during live conversations.
Blocking ChatGPT-User on pages you want recommended. If a user asks ChatGPT about your product and ChatGPT-User cannot fetch the page, you get nothing.
Using a catch-all User-agent: * with Disallow: /. This blocks all three crawlers plus every other bot on the internet.

How Scanner Helps

Scanner checks your robots.txt for misconfigurations affecting each of OpenAI's three crawlers. See also How Agent Crawlers Work.

See how your site scores.

Run a free scan at point11.ai to check your How ChatGPT Crawlers Work and 40+ other metrics.

Scan Your Site

OpenAI doesn't have one crawler. It has three, and they do fundamentally different things. Block the wrong one and you cut off ChatGPT from your content entirely.

OpenAI Crawlers

Three separate bots for training, live browsing, and search features

GPTBot

User-agent: GPTBot

Crawls pages for AI model training data. Blocking this prevents your content from being used to train future GPT models.

User-agent: GPTBot Disallow: /

Blocked

Model training

ChatGPT-User

User-agent: ChatGPT-User

Fetches pages in real-time when a ChatGPT user asks it to browse a URL. Acts like a regular visitor reading your page on demand.

User-agent: ChatGPT-User Allow: /

Allowed

Live browsing

OAI-SearchBot

User-agent: OAI-SearchBot

Crawls pages for ChatGPT's search feature and SearchGPT. Blocking this removes your site from AI-powered search results.

User-agent: OAI-SearchBot Allow: /

Allowed

AI search

Key insight: Most sites block GPTBot (training) while allowing ChatGPT-User (browsing) and OAI-SearchBot (search) to maintain visibility in AI-powered search.

The Three Crawlers

OpenAI operates three crawlers, each with a separate job^[1]:

GPTBot collects training data. It has no connection to live conversations.
ChatGPT-User browses your site in real time when a user asks ChatGPT to look something up.
OAI-SearchBot indexes content for ChatGPT Search results, the answer panel that competes with Google's featured snippets.

robots.txt Control

Each crawler has its own user-agent string, so you can block them independently:

``` User-agent: GPTBot Disallow: /

User-agent: ChatGPT-User Allow: /

User-agent: OAI-SearchBot Allow: / ```

This example opts out of training while keeping your site visible in live conversations and search results. Adjust the directives to match your own policy.

Common Mistakes

Blocking GPTBot and assuming it prevents all ChatGPT access. GPTBot only collects training data. ChatGPT-User is the crawler that browses during live conversations.
Blocking ChatGPT-User on pages you want recommended. If a user asks ChatGPT about your product and ChatGPT-User cannot fetch the page, you get nothing.
Using a catch-all User-agent: * with Disallow: /. This blocks all three crawlers plus every other bot on the internet.

How Scanner Helps

Scanner checks your robots.txt for misconfigurations affecting each of OpenAI's three crawlers. See also How Agent Crawlers Work.

How ChatGPT Crawlers Work

The Three Crawlers

robots.txt Control

Common Mistakes

How Scanner Helps

More from Learn

How Agent Crawlers Work

Structured Data Is Your Site's API for Agents

What Is llms.txt

How ChatGPT Crawlers Work

GPTBot

ChatGPT-User

OAI-SearchBot

The Three Crawlers

robots.txt Control

Common Mistakes

How Scanner Helps

More from Learn

How Agent Crawlers Work

Structured Data Is Your Site's API for Agents

What Is llms.txt

GPTBot

ChatGPT-User

OAI-SearchBot