Disallowing GPTBots From Accessing Your Site Content

OpenAI has now documented how their GPTBot crawls the net and ingests content into their models and they’ve included basic documentation on what to add to robots.txt to disallow bot access to your site.

Pressable does not currently disallow these bots by default, but it’s possible to block them from accessing your site by following the guide below.

Disallowing GPTBot

To disallow GPTBot to access your site entirely, you can add the following to your site’s robots.txt file:

User-agent: GPTBot
Disallow: /

Customize GPTBot access

To allow GPTBot to only access parts of your site, you can change the token in your robots.txt file to this and adjust the directory/path to your preference:

User-agent: GPTBot
Allow: /directory1
Disallow: /directory2

Additional Information on GPTBot

OpenAI has two different user agents for web crawling and user browsing, but the opt-out system currently treats both user agents the same. So restricting the ‘GPTBot’ user-agent will cover both.


Disallowing Other AI Agents

You can disallow additional bots using the same method as above and utilizing their associated user-agent. Below is a robots.txt example for disallowing all currently known AI agents:

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: cohere-ai
Disallow: /