403 forbidden handling
-
It seems some websites are blocking Broken Link Checker resulting in 403 forbidden errors. One of those is Kickstarter.
It might be a good idea to have a setting to ignore 403 forbidden errors.
-
Hi @cnxsoft
I hope you are doing good today.
The Broken link checker uses the server IP to check the link, if the destination site refuses the connection it will show as broken. Could you review your plugins list and see do any security plugins exist if so disable them and see if this helps?
We have such request in past and in most cases security plugin was the issue.Kind Regards,
KrisI get these too, and they’re more likely due to the user agent used than the IP address being blocked or a security plugin on the local site.
I think that when Broken Link Checker gets a 403, it should retest the link after a long enough interval using a desktop or mobile user agent to confirm.
I don’t have any security plugin that could impact the Link Checker plugin. I’ve actually installed WordFence, but the 403 and in some case 503 issues showed up before that.
It only also impacts a few websites like amazon or kickstarter. Here’s what I get following direct testing on my server with wget
Here’s what I get with Amazon:
wget https://www.amazon.com --2022-07-21 09:32:14-- https://www.amazon.com/ Resolving www.amazon.com (www.amazon.com)... 13.227.67.197 Connecting to www.amazon.com (www.amazon.com)|13.227.67.197|:443... connected. HTTP request sent, awaiting response... 503 Service Unavailable 2022-07-21 09:32:14 ERROR 503: Service Unavailable.and kickstarter:
wget https://www.kickstarter.com/ --2022-07-21 09:34:45-- https://www.kickstarter.com/ Resolving www.kickstarter.com (www.kickstarter.com)... 2606:4700::6812:1ba8, 2606:4700::6812:1aa8, 104.18.26.168, ... Connecting to www.kickstarter.com (www.kickstarter.com)|2606:4700::6812:1ba8|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2022-07-21 09:34:45 ERROR 403: Forbidden.Trying to repeat that on my local PC returns the same errors, so it should not be IP blocking, but likely related to the user agent, as those two websites (and others) are attempting to block bots.
Gal idea would be good if it can be implemented.
@cnxsoft Try varying your user agent from the default one used by wget. For example, try this:
wget -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" https://www.amazon.com/Also, the mention of your local PC makes me think of reverse DNS lookup. Some security systems use this (PTR records) to avoid spoofing. So check your domain (like this one).
Hi all.
We apologize for the delay from our side. It seems an email notification from this thread escaped our attention. I pinged our BLC Team which works now on a new engine to review your query.
Kris
Hi again.
Please check this temporary solution which can help:
https://ww.wp.xz.cn/support/topic/please-update-ua-string-once-in-a-while/#post-15881270Also, note we work on a new engine, and what we do with the future BLC link crawling service is, we will maintain a list of such sites and auto-ignore those specific issues.
Kris
Thanks for this, Kris. As you say, that’s a workaround.
Rather than ignoring sites, which possibly means not finding out when the links to them actually break, it would be good to have a hierarchy of user agents that the plugin uses, with the last ones tried being a popular desktop UA and a popular mobile UA.
This way, the plugin tries with the default server UA first, but if this gets blocked, it tries the next one on the list.
It might even be good to expose the list of user agents as a setting, and allow admins to modify it, with a “reset” button, just in case 😉
The topic ‘403 forbidden handling’ is closed to new replies.