AI Companies are Forcing Websites to Play Whack-a-Bot

404 Media has an insightful piece on the complexities of correctly blocking bots—a topic perfectly aligned with my recent newsletter on AI and the robots.txt file.

Anthropic, the creators of Claude, are actively indexing content on the public web. However, the names of the bots and crawlers they employ seem to be in flux or are changing frequently. This makes it difficult, if not impossible, to tell AI tools not to consume your content.

I believe this isn’t necessarily a nefarious action, particularly from a company that emphasizes making AI safe. However, this constant name-changing makes it challenging to ensure you’re blocking the intended bots.

In an ideal scenario, robots.txt would allow for a whitelist approach, enabling us to specify who can access our content and compelling companies to maintain consistent bot names. Alternatively, it might be time to adopt Reddit’s approach and block everything, sending both search engines and AI bots a 404 error page.

Websites are Blocking the Wrong AI Scrapers (Because AI Companies Keep Making New Ones)

Hundreds of sites have put old Anthropic scrapers on their blocklist, while leaving a new one unblocked.

Uncategorized

Thoughts on Tech & Things

Jason Michael Perry

AI Companies are Forcing Websites to Play Whack-a-Bot