• Resolved jamiebaker

    (@jamiebaker)


    I’ve used Wordfence for several years on shared hosting. Lately I’ve had a lot of usage issues with my site going down with 508 warnings. It took me sometime to figure out why this is so, but it is Googlebot. It repeatedly hits my site every day, and the periods when it is RE-indexing my site match up precisely to when my site gets 508 errors. Previously the rate limiting tool on Search Console kept Googlebot at bay but not any more.

    Wordfence’s own rate limiting settings seem to work very well on other bots but not Googlebot for some reason. I have a message into Google, but in the meantime is there any guidance on Wordfence settings that can throttle Googlebot? Because what I have now seems not to work.

    Thank you!

    The page I need help with: [log in to see the link]

Viewing 14 replies - 1 through 14 (of 14 total)
  • Plugin Support wfmargaret

    (@wfmargaret)

    Hi @jamiebaker,

    Thanks for reaching out! By default, Wordfence doesn’t rate-limit Google crawlers. Typically, Googlebot should acknowledge your crawl rate requests and automatically adjust to the site.

    To rate-limit Googlebot, select Treat Google like any other Crawler for the setting How should we treat Google’s crawlers in Wordfence > Firewall > Rate Limiting.

    Please let me know if you have any questions!

    Thanks,
    Margaret

    Thread Starter jamiebaker

    (@jamiebaker)

    Thanks for getting back to me so quickly!

    Yes, I already have Treat Google like any other Crawler turned on but it doesn’t seem to make any difference. Can you think of anything that might interfere with that setting?

    Also when you say “Googlebot should acknowledge your crawl rate requests” where are those requests made? I used to see some interface in Search Console but that interface disappeared (for me anyway) last year.

    Plugin Support wfmargaret

    (@wfmargaret)

    Hi @jamiebaker,

    Googlebot should slow down its crawling if it runs into error codes, such as a 503 from rate-limiting. If this isn’t happening on your site, according to their documentation, you can report a problem here: https://developers.google.com/search/docs/crawling-indexing/reduce-crawl-rate#exceptional-requests-to-reduce-crawl-rate

    So I can take a look at your settings, can you send a diagnostic report to wftest @ wordfence . com? You can find the link to do so at the top of the Wordfence > Tools > Diagnostics page. Then click on “Send Report by Email”. Please add your forum username where indicated and respond here after you have sent it.

    NOTE: It should look as follows – Screenshot of Tools > Diagnostic > Send by Email

    Thanks,
    Margaret

    Thread Starter jamiebaker

    (@jamiebaker)

    Magaret

    thank you for all that info. I sent an email to the address you mentioned.

    thanks!

    James

    Plugin Support wfmargaret

    (@wfmargaret)

    Hey James,

    Thanks for sending that! Your settings overall look good!

    If you’re still experiencing downtime due to an aggressive crawl rate, you might be able to set stricter rate-limiting settings. If you use stricter settings, make sure to keep an eye on your Live Traffic log for a few days to ensure you’re not throttling regular visitors.

    Please let me know if you have any questions!

    Thanks,
    Margaret

    Thread Starter jamiebaker

    (@jamiebaker)

    Margaret

    unfortunately, even the strictest rate-limiting settings in Wordfence don’t slow Googlebot down. It is still bombing my site and causing 508 errors. Are you aware of any other plugins, or settings elsewhere, that might interfere with WordPress doing its thing?

    thank you!

    James

    Thread Starter jamiebaker

    (@jamiebaker)

    if I am using the strictest settings to rate-limit google but it still hammers my site daily, could there be a problem elsewhere in my site that affects Wordfence?

    JB

    Thread Starter jamiebaker

    (@jamiebaker)

    Esther a way in WF to send a 429 message to Google?

    Thread Starter jamiebaker

    (@jamiebaker)

    Sorry, that last question was supposed to say- “is there a way in WF to send a 429 message to Google, just to get it to slow down?”

    I’ve read the it is supposed to be self regulating and smart enough to know when it is over-powering a shared site, but that is not happening in my case.

    JB

    Plugin Support wfmargaret

    (@wfmargaret)

    Hey James,

    I’m sorry about the delay in getting back to you, and thank you for your patience while I was out last week. When Wordfence throttles Googlebot, it returns a 503. As you said, Googlebot should understand this response and slow its crawl rate.

    Can you send me your access logs from around the time the site is being crawled too heavily, please? Your host should be able to provide those if you don’t regularly have access to them. You can email them to wftest @ wordfence . com. Please use your forum username as the subject and let me know here once you’ve sent those!

    Thanks,
    Margaret

    Thread Starter jamiebaker

    (@jamiebaker)

    Margaret

    I sent the access logs!

    James

    PS: I hope you were out on vacay instead of being sick!

    Plugin Support wfmargaret

    (@wfmargaret)

    Hey James,

    Thanks for sending the logs over and for the well wishes! I was out for positive reasons, thankfully!

    In the section you highlighted, the IP address 34.174.*** accessed the site successfully only 2 times before the 508 response started. Since a 508 response indicates the server has run out of resources, it’s sent by the server itself before the request reaches Wordfence, which is why the rate limiting didn’t kick in. The hostname on the IP is *.bc.googleusercontent.com. This is often confused for Googlebot, but it’s actually the hostname used by Google-hosted cloud servers for users, rather than being used by Google itself.

    Checking through the access logs, I also see a few times where there are bots spoofing the user agent Googlebot. For example, on 9/25 at 5:30 am, the IP address 79.127.*** uses the Googlebot user agent, but is not a legitimate Googlebot. I see they had approximately 60 accesses over the course of 2 minutes before the server started to return 508 responses. This would not have been throttled by the rate-limiting settings in the original diagnostics I received.

    If you’d like to verify legitimate Googlebot traffic in the future, you can use reverse DNS lookup or check against Google’s published IP ranges. Google has some solid details on that here: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot

    If you often see a variety of IP addresses before the resource limits have been hit, it might be worth using a service with DDoS protection, such as Cloudflare, to add an extra layer of security to the site.

    If your server’s resource limits are fairly low, you might consider setting the rate limits as low as 60 requests per minute, but please keep an eye on your Live Traffic log to ensure that regular visitors aren’t being blocked.

    It may also be a good idea to check with your host. They should be able to help review for any potential issues leading to higher-than-normal resource usage.

    Please let me know if you have any questions!

    Thanks,
    Margaret

    Thread Starter jamiebaker

    (@jamiebaker)

    Margaret

    thank you for looking at my access logs. I still cant figure out why the bots bomb my site daily. I only update it once a week, and my headers should tell bots this. Maybe they have become scrambled somehow..

    I just turned the rate limiting way up. Hopefully that will fix the 508s in the short term. In the long term I may need better hosting..

    thanks again!

    JB

    Plugin Support wfmargaret

    (@wfmargaret)

    Hey JB,

    Since one of the bots is spoofing the Googlebot user agent, I suspect they wouldn’t be well-behaved on the site. It did look like, at several points, the 508 errors started very early into the bots’ access attempts, so it could be time to move to hosting with more resources available.

    We don’t normally recommend permanent blocks, as IP addresses can change pretty often, but if you see the same IP addresses attacking the site over and over, it might be worthwhile in your case to permanently block the biggest offenders. Since your firewall was optimized, they’d be served the block page before any other plugins load, which would help you save server resources.

    I’m going to go ahead and resolve this ticket, but please let me know here if you have any other questions, and I’ll be happy to help!

    Thanks,
    Margaret

Viewing 14 replies - 1 through 14 (of 14 total)

The topic ‘rate limiting googlebot?’ is closed to new replies.