GPTBot
GPTBot is OpenAI's web crawler, used to collect public web content for training future GPT models and for SearchGPT.
GPTBot (user-agent GPTBot, launched August 2023) respects robots.txt. Site owners choose: block (protect content from training) or allow (be present in GPT knowledge, chance of citations in ChatGPT/SearchGPT). For GEO strategy: trade off between content protection and AI visibility. OpenAI also has a separate OAI-SearchBot crawler (for live search) and ChatGPT-User (on-demand browsing).
Example
In robots.txt: User-agent: GPTBot blocks training. Or selectively:
Disallow: /Disallow: /premium/ protects only premium content. Or nothing: then GPTBot crawls freely.
Frequently asked questions
Should I block GPTBot?
Strategic choice. Blocking = absent from future models but missing GEO visibility. For unique/copyright-sensitive content: blocking reasonable. For marketing content: leave open.
GPTBot vs OAI-SearchBot?
GPTBot crawls for training (mass, offline). OAI-SearchBot indexes for SearchGPT (real-time). ChatGPT-User: on-demand when a user triggers 'browse' in ChatGPT.
Related terms
Further reading
- → Our service: GEO
- → Blog: Measuring GEO in GA4: ChatGPT & Perplexity tracking
- → Blog: What is GEO? Generative Engine Optimization explained