# Code: https://github.com/ellie/notes # Source: https://darkvisitors.com/ # Disallow newsletters User-agent: * Disallow: /newsletter/ User-agent: GoogleBot Disallow: /newsletter/ # OpenAI, ChatGPT # https://platform.openai.com/docs/gptbot User-agent: GPTBot User-agent: GPTBot-User User-agent: ChatGPT User-agent: ChatGPT-User User-agent: OAI-SearchBot Disallow: / # Google AI (Bard, etc) # https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers User-agent: Google-Extended Disallow: / # Amazon User-agent: Amazonbot Disallow: / # Block common crawl # I have mixed feelings on this one, but many models are trained on this data # It is also used to bootstrap new search indices though # https://commoncrawl.org/ccbot User-agent: CCBot Disallow: / # Facebook # https://developers.facebook.com/docs/sharing/bot/ User-agent: FacebookBot Disallow: / # Cohere.ai # https://darkvisitors.com/agents/cohere-ai User-agent: Cohere-ai Disallow: / # Perplexity # https://docs.perplexity.ai/docs/perplexitybot User-agent: PerplexityBot Disallow: / # Anthropic # https://darkvisitors.com/agents/anthropic-ai # https://darkvisitors.com/agents/claudebot User-agent: Anthropic-ai User-agent: ClaudeBot User-agent: Claude-Web Disallow: / # Apple User-agent: Applebot-Extended Disallow: / # Awario User-agent: AwarioRssBot User-agent: AwarioSmartBot Disallow: / # Other AI companies User-agent: Omgili User-agent: Omgilibot User-agent: Bytespider User-agent: DataForSeoBot User-agent: ImagesiftBot User-agent: Magpie-crawler User-agent: YouBot User-agent: Peer39_crawler User-agent: Peer39_crawler/1.0 Disallow: / # Old blog posts User-agent: * Disallow: /posts-old/ Disallow: /posts-old1/ # Block SISTRIX User-agent: SISTRIX Crawler Disallow: / User-agent: sistrix Disallow: / User-agent: 007ac9 Disallow: / User-agent: 007ac9 Crawler Disallow: / # Block Uptime robot #User-agent: UptimeRobot/2.0 #Disallow: / # Block Ezooms Robot #User-agent: Ezooms Robot #Disallow: / # Block Perl LWP #User-agent: Perl LWP #Disallow: / # Block netEstate NE Crawler (+http://www.website-datenbank.de/) #User-agent: netEstate NE Crawler (+http://www.website-datenbank.de/) #Disallow: / # Block WiseGuys Robot #User-agent: WiseGuys Robot #Disallow: / # Block Turnitin Robot #User-agent: Turnitin Robot #Disallow: / # Block Heritrix - used by Internet Archive #User-agent: Heritrix #Disallow: / # Block pricepi User-agent: pimonster Disallow: / User-agent: SurdotlyBot Disallow: / User-agent: ZoominfoBot Disallow: / User-agent: MJ12bot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: barkrowler Disallow: /