Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps - Status | Demo | Docs | Install
  1. Cloudron Forum
  2. WordPress (Developer)
  3. Is there a way to rate limit connections to a site for certain user agent strings?

Is there a way to rate limit connections to a site for certain user agent strings?

Scheduled Pinned Locked Moved WordPress (Developer)
14 Posts 8 Posters 917 Views 8 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • d19dotcaD d19dotca

    Hello,

    I have a particular website that for the last 2+ days has been reaching max memory and restarting frequently, a dozen times a day. I've tried increasing the memory which has helped of course but that's only a temporary workaround. The issue started when (according to the logs) the site started receiving an onslaught of traffic from Facebook crawler bots, specifically their Meta-ExternalAgent/1.1 one.

    What I'd like to do is try to rate limit (within Cloudron if possible) the requests from certain user agents, to maybe 10 a minute for example instead of several a second (which is currently what I'm seeing). If this is possible, I'd love to know.

    I may be able to use a plugin in WordPress to do that but my thinking is this will still take up Apache connections which can still saturate the connections. In fact I tried to do this with the .htaccess using something ChatGPT recommended, but this just slows down the data rate and doesn't really slow down the indexing from Facebook / Meta, so I suspect this will simply increase the connection saturation if each request takes a bit longer to respond to.

    # BEGIN Meta-ExternalHit Throttling
    <IfModule mod_rewrite.c>
        RewriteEngine On
        # Detect Meta-ExternalHit user agent
        RewriteCond %{HTTP_USER_AGENT} "Meta-ExternalHit" [NC]
        # Set an env var if matched
        RewriteRule ^ - [E=IS_META_BOT:1]
    </IfModule>
    
    <IfModule mod_ratelimit.c>
        # Apply rate limit if Meta bot detected
        SetEnvIf IS_META_BOT 1 META_BOT
        <IfModule mod_filter.c>
            AddOutputFilterByType RATE_LIMIT text/html text/plain text/xml application/json application/xml image/jpeg image/png image/webp image/avif
        </IfModule>
        # Limit to ~50 KB/s (value is KB per second)
        SetEnvIf META_BOT 1 RATE_LIMIT 50
    </IfModule>
    # END Meta-ExternalHit Throttling
    

    This is leading to the health checks taking over 7000ms as well which I see in the logs.

    Thank you in advance for any advice.

    luckowL Offline
    luckowL Offline
    luckow
    translator
    wrote on last edited by
    #3

    @d19dotca Install your own WAF. We have been testing https://www.bunkerweb.io/ for almost a month. And it works.

    Pronouns: he/him | Primary language: German

    necrevistonnezrN jdaviescoatesJ 2 Replies Last reply
    2
    • robiR Offline
      robiR Offline
      robi
      wrote on last edited by
      #4

      Another thought it to inspect their robots.txt for any directives for their bots which you may adapt for your needs.

      Conscious tech

      1 Reply Last reply
      0
      • luckowL luckow

        @d19dotca Install your own WAF. We have been testing https://www.bunkerweb.io/ for almost a month. And it works.

        necrevistonnezrN Offline
        necrevistonnezrN Offline
        necrevistonnezr
        wrote on last edited by
        #5

        @luckow said in Is there a way to rate limit connections to a site for certain user agent strings?:

        @d19dotca Install your own WAF. We have been testing https://www.bunkerweb.io/ for almost a month. And it works.

        Would that be interesting as a Cloudron service?

        1 Reply Last reply
        2
        • luckowL luckow

          @d19dotca Install your own WAF. We have been testing https://www.bunkerweb.io/ for almost a month. And it works.

          jdaviescoatesJ Offline
          jdaviescoatesJ Offline
          jdaviescoates
          wrote on last edited by
          #6

          @luckow said in Is there a way to rate limit connections to a site for certain user agent strings?:

          @d19dotca Install your own WAF. We have been testing https://www.bunkerweb.io/ for almost a month. And it works.

          Sounds good. How?

          I use Cloudron with Gandi & Hetzner

          luckowL 1 Reply Last reply
          0
          • jdaviescoatesJ jdaviescoates

            @luckow said in Is there a way to rate limit connections to a site for certain user agent strings?:

            @d19dotca Install your own WAF. We have been testing https://www.bunkerweb.io/ for almost a month. And it works.

            Sounds good. How?

            luckowL Offline
            luckowL Offline
            luckow
            translator
            wrote on last edited by
            #7

            @jdaviescoates
            The good old traditional method: https://docs.bunkerweb.io/latest/integrations/#linux
            Runs on a CX22 on https://www.hetzner.com/cloud/.

            Bunkerweb acts as a reverse proxy for a Cloudron app that is ‘behind it’. Currently, we only use it in front of our own website (mainly because we are still learning, e.g. what happens when we block bots? Oh, there is no longer support for previews in rocket.chat). In my next spare moment, I'll try out what happens when a complete Cloudron instance is behind Bunkerweb. It should work. From what I've heard, this is the case with Cloudflare, and Bunkerweb is similar (only self-hosted) 🙂

            Pronouns: he/him | Primary language: German

            imc67I 1 Reply Last reply
            1
            • d19dotcaD d19dotca

              Hello,

              I have a particular website that for the last 2+ days has been reaching max memory and restarting frequently, a dozen times a day. I've tried increasing the memory which has helped of course but that's only a temporary workaround. The issue started when (according to the logs) the site started receiving an onslaught of traffic from Facebook crawler bots, specifically their Meta-ExternalAgent/1.1 one.

              What I'd like to do is try to rate limit (within Cloudron if possible) the requests from certain user agents, to maybe 10 a minute for example instead of several a second (which is currently what I'm seeing). If this is possible, I'd love to know.

              I may be able to use a plugin in WordPress to do that but my thinking is this will still take up Apache connections which can still saturate the connections. In fact I tried to do this with the .htaccess using something ChatGPT recommended, but this just slows down the data rate and doesn't really slow down the indexing from Facebook / Meta, so I suspect this will simply increase the connection saturation if each request takes a bit longer to respond to.

              # BEGIN Meta-ExternalHit Throttling
              <IfModule mod_rewrite.c>
                  RewriteEngine On
                  # Detect Meta-ExternalHit user agent
                  RewriteCond %{HTTP_USER_AGENT} "Meta-ExternalHit" [NC]
                  # Set an env var if matched
                  RewriteRule ^ - [E=IS_META_BOT:1]
              </IfModule>
              
              <IfModule mod_ratelimit.c>
                  # Apply rate limit if Meta bot detected
                  SetEnvIf IS_META_BOT 1 META_BOT
                  <IfModule mod_filter.c>
                      AddOutputFilterByType RATE_LIMIT text/html text/plain text/xml application/json application/xml image/jpeg image/png image/webp image/avif
                  </IfModule>
                  # Limit to ~50 KB/s (value is KB per second)
                  SetEnvIf META_BOT 1 RATE_LIMIT 50
              </IfModule>
              # END Meta-ExternalHit Throttling
              

              This is leading to the health checks taking over 7000ms as well which I see in the logs.

              Thank you in advance for any advice.

              andreasduerenA Offline
              andreasduerenA Offline
              andreasdueren
              wrote on last edited by
              #8

              @d19dotca Think what you want of Cloudflare but their caching is prettyy good, plus they also hate AI Bots and have specific options to block them: https://developers.cloudflare.com/ai-crawl-control/

              1 Reply Last reply
              1
              • d19dotcaD Offline
                d19dotcaD Offline
                d19dotca
                wrote on last edited by
                #9

                Thank you all for the suggestions! Good ideas!

                Overnight after my message to the forum it turns out the bot traffic finally went back to normal levels overnight and the app has been stable ever since. But this definitely reminded me that getting a good WAF (or improving the robots.txt at a minimum) can be important and needs to be evaluated.

                Hopefully Cloudron can integrate a simplistic WAF into the system directly in the future (maybe even using that BunkerWeb if possible). 🤞

                --
                Dustin Dauncey
                www.d19.ca

                1 Reply Last reply
                3
                • luckowL luckow

                  @jdaviescoates
                  The good old traditional method: https://docs.bunkerweb.io/latest/integrations/#linux
                  Runs on a CX22 on https://www.hetzner.com/cloud/.

                  Bunkerweb acts as a reverse proxy for a Cloudron app that is ‘behind it’. Currently, we only use it in front of our own website (mainly because we are still learning, e.g. what happens when we block bots? Oh, there is no longer support for previews in rocket.chat). In my next spare moment, I'll try out what happens when a complete Cloudron instance is behind Bunkerweb. It should work. From what I've heard, this is the case with Cloudflare, and Bunkerweb is similar (only self-hosted) 🙂

                  imc67I Offline
                  imc67I Offline
                  imc67
                  translator
                  wrote last edited by
                  #10

                  @luckow said in Is there a way to rate limit connections to a site for certain user agent strings?:

                  Bunkerweb acts as a reverse proxy for a Cloudron app that is ‘behind it’. Currently, we only use it in front of our own website (mainly because we are still learning, e.g. what happens when we block bots? Oh, there is no longer support for previews in rocket.chat). In my next spare moment, I'll try out what happens when a complete Cloudron instance is behind Bunkerweb. It should work. From what I've heard, this is the case with Cloudflare, and Bunkerweb is similar (only self-hosted) 🙂

                  Hi @luckow I'm really curious how it went with Bunkerweb in front of Cloudron?

                  I am moving domains from Cloudflare to deSEC but can't do all because I use Cloudflare WAF for some Cloudron-apps (Geoblocking and/or IP whitelist with DDNS/API on app-level). And because Cloudron doesn't have anything like a WAF a workaround (what a pity) could be Bunkerweb?

                  luckowL 1 Reply Last reply
                  1
                  • imc67I imc67

                    @luckow said in Is there a way to rate limit connections to a site for certain user agent strings?:

                    Bunkerweb acts as a reverse proxy for a Cloudron app that is ‘behind it’. Currently, we only use it in front of our own website (mainly because we are still learning, e.g. what happens when we block bots? Oh, there is no longer support for previews in rocket.chat). In my next spare moment, I'll try out what happens when a complete Cloudron instance is behind Bunkerweb. It should work. From what I've heard, this is the case with Cloudflare, and Bunkerweb is similar (only self-hosted) 🙂

                    Hi @luckow I'm really curious how it went with Bunkerweb in front of Cloudron?

                    I am moving domains from Cloudflare to deSEC but can't do all because I use Cloudflare WAF for some Cloudron-apps (Geoblocking and/or IP whitelist with DDNS/API on app-level). And because Cloudron doesn't have anything like a WAF a workaround (what a pity) could be Bunkerweb?

                    luckowL Offline
                    luckowL Offline
                    luckow
                    translator
                    wrote last edited by
                    #11

                    @imc67 I never found the time to delve deeper into the test system (WAF – here Bunkerweb – in front of a dedicated Cloudron instance). Conversely: I completely missed my own challenge months ago. Thanks for bringing that up again. 🙂

                    With Bunker we're on the free tier. One thing missing in the free tier is: reporting/monitoring over a longer period. So no direct insight into the numbers the WAF filters out. But from our experience with one app on Cloudron (our own website): no downtime, no stress, nothing. Everything as expected after some manual configurations.

                    1. From a marketing perspective: Filtering out bots causes problems. No link previews in LinkedIn, Rocket.Chat, Signal... Problem solved by allowlists for some User-Agents. But in the long run it feels wrong to only pay attention to User-Agents. Bad bots find solutions to adopt the "good" User-Agents. In that case I don't think the WAF will work. We'll see.

                    2. Our website runs on Drupal. We added custom rules to forbid certain URL structures. What we learned: Some editors use workflows that generate URL structures that were forbidden. So they asked the Bunker administrators to change the structure to enable their work.

                    3. Our first update of Bunkerweb ended directly in a disaster. The maintainers rolled back the update, we reverted the version and a few days later a new update was released. That works. The last two updates worked without problems.

                    4. Is the time investment worth it? I think so. We have so few answers on alternatives to Cloudflare. We need a solid free and open-source alternative. What we learned: When it works, it works. You have to learn some new terminologies and technologies. It makes us stronger in decisions and better at consulting. Is it as good as Cloudflare? Maybe later, from my point of view. Don't forget that an important issue is DOS attacks. This is not solved with Bunkerweb in the free version on a Hetzner VPS.

                    Once I find time to dedicate myself to the test system (Cloudron instance behind Bunkerweb) again, I will post an update. Many thanks for the reminder.

                    Pronouns: he/him | Primary language: German

                    1 Reply Last reply
                    2
                    • imc67I Offline
                      imc67I Offline
                      imc67
                      translator
                      wrote last edited by
                      #12

                      @luckow thanks for the very detailed experiences up to now! I was searching (partly via AI) the web for Cloudflare WAF alternatives and it's really unbelievable they are sooo rare!

                      As long as Cloudron doesn't have anything like WAF (on app and URL-parts) and one from Europe wants to leave Cloudflare there is not much choice 😫

                      humptydumptyH 1 Reply Last reply
                      0
                      • imc67I imc67

                        @luckow thanks for the very detailed experiences up to now! I was searching (partly via AI) the web for Cloudflare WAF alternatives and it's really unbelievable they are sooo rare!

                        As long as Cloudron doesn't have anything like WAF (on app and URL-parts) and one from Europe wants to leave Cloudflare there is not much choice 😫

                        humptydumptyH Offline
                        humptydumptyH Offline
                        humptydumpty
                        wrote last edited by humptydumpty
                        #13

                        @imc67 https://bunny.net/blog/bunny-shield-waf-fast-flexible-and-regex-ready/

                        https://bunny.net/shield/

                        1 Reply Last reply
                        0
                        • robiR Offline
                          robiR Offline
                          robi
                          wrote last edited by
                          #14

                          Curious, why isn't an app level WAF like the ones for WP suitable?

                          Conscious tech

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • Bookmarks
                          • Search