Disallowing GPTBots From Accessing Your Site Content

Category: Pressable Tutorials Last modified: May 2, 2024

OpenAI has now documented how their GPTBot crawls the net and ingests content into their models and they’ve included basic documentation on what to add to robots.txt to disallow bot access to your site.

Pressable does not currently disallow these bots by default, but it’s possible to block them from accessing your site by following the guide below.

Disallowing GPTBot

To disallow GPTBot to access your site entirely, you can add the following to your site’s robots.txt file:

User-agent: GPTBot Disallow: /

Customize GPTBot access

To allow GPTBot to only access parts of your site, you can change the token in your robots.txt file to this and adjust the directory/path to your preference:

User-agent: GPTBot Allow: /directory1 Disallow: /directory2

Additional Information on GPTBot

OpenAI has two different user agents for web crawling and user browsing, but the opt-out system currently treats both user agents the same. So restricting the ‘GPTBot’ user-agent will cover both.

Disallowing Other AI Agents

You can disallow additional bots using the same method as above and utilizing their associated user-agent. Below is a robots.txt example for disallowing all currently known AI agents:

User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: Diffbot Disallow: / User-agent: FacebookBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: omgili Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Claude-Web Disallow: / User-agent: ClaudeBot Disallow: / User-agent: cohere-ai Disallow: /