• cmnybo@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    19
    ·
    2 months ago

    Are there any good log monitoring programs that will automatically blacklist the IP of any crawler that ignores robots.txt?

    • midribbon_action@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      8
      ·
      2 months ago

      Yeah, I’ve been curious if you could explicitly block a page in robots.txt, hide an invisible link to the same page in your footer, then kinda have it act like an immediate IP block when someone requests it.

      • mic_check_one_two@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 months ago

        There are systems that will use a hidden hyperlink (which only a bot would see and use) which directs them to an infinitely long/wide junk link tree. It means they end up trapped in bot-purgatory and stop crawling the rest of your site.

        The issue is that it means you end up consuming resources just to keep the bot trapped.