Rescan broken links?
-
With the old version, you could rescan links listed as broken. In the cloud version, I don’t see that option. I have a ton of links marked as 403 and 404 broken links that I know are good. How can I set them all to be rescanned via the cloud version? Thanks.
-
Hi @jamminjames,
I hope you are keeping well, and thank you for reaching out to us.
The Cloud Version lets you start a new scan for your site. If you are hoping to scan specific URLs on their own, I understand this can cause some confusion. The Cloud Version does not include a way to check broken links one by one, and scans can only be run for the entire site.
Best Regards,
Nebu JohnSo if you start a new scan, will those broken links still be listed? Or will it wipe those out and start over?
Hello @jamminjames
Since it scans the entire website each time you run a new scan, the broken link list will be updated with each scan when you are using the Cloud version.
In other words, it the current list and creates a new one.
Best Regards
AminOkay, I came back to the cloud version after a year or so to try it again, hoping you folks had corrected the time problem, but it still times out. When it does, there’s a notice saying ‘we hope to increase the time limit soon’ or something like that, but it’s been that way for over a year at least. So, unfortunately, it’s no good for a large site.
So, I’m going back to the website version. But the reason I tried cloud again was that the website version flagged about 2,000 links as ‘forbidden,’ but they’re fine. I set them all to rescan, but they stay on the list, so I’m not sure they really are set to rescan.
Hello @jamminjames,
I hope things are going well for you.
I understand your concern, but this time we have no good news regarding the limit enhancement, which was already set to be the same for the free user.
I tried cloud again was that the website version flagged about 2,000 links as ‘forbidden,’ but they’re fine.
Would you mind sharing the site URL so we can review it from HUB and identify which of the 2000 links are shown as Forbidden and who is blocking the BLC system from scanning it?
but they stay on the list, so I’m not sure they really are set to rescan.
Do you mean to re-scan the BLC Local Version? If yes, navigate to Local [OLD] > Settings tab > Forced recheck > Re-check all pages. This way, the Local BLC will start scanning all the pages.
–
Kind Regards,
Imran KhanWebsite: humortimes.com
The false “forbidden” flags were caused by a bot restriction setting we had in our Cloudflare CDN, we have since adapted that rule so that anything originating from our website will be skipped (using the Local BLC version — not sure if we’d need a different rule change if using the Cloud BLC).
Hi @jamminjames,
Can you please try whitelisting Broken Link Checker IPs for the Cloud version and see if that helps? Please find the IP details and more details on our documentation here: https://wpmudev.com/docs/getting-started/wpmu-dev-ip-addresses/
Let us know how that goes.
Best Regards,
Nebu JohnThe problem we’re having with the Cloud version is the timeout issue, that won’t help with that, I don’t think. But I will use those IPs if we end up coming back to it when you folks extend the time limit.
Hello @jamminjames
If you keep hitting timeout or server 25 error, that means you’re having too many links and BLC can’t finish the scan in time, which could be due to a lack of resources.
The easiest way to fix this is to add some links to ignore list, the links that you already know are not broken, can help fix the timeout error. You can use the Regex to add links in bulk.
Please read this document to learn more:
https://wpmudev.com/docs/wpmu-dev-plugins/broken-link-checker/#ignore-url-rulesBest Regards
Amin“The easiest way to fix this is to add some links to ignore list” – that would not in any sense be “easy”! We have thousands of images on the site.
When you say “could be due to a lack of resources,” do you mean on our server? This happens after 4 hours, I believe it says, so I don’t think it could be our server, it’s got to be a time limit self-imposed by your cloud service, and it’s because we have so many images. I think if it could finish one time, then your cloud service would have the info, and wouldn’t have to scan every image from that point on, correct? So I don’t know why you folks have that arbitrary time limit.
Hello @jamminjames
The BLC scanner can automatically adjust itself based on site errors, if it receives too many errors ,it will check fewer links simultaneously, which is why I mentioned it could be related to server resources.
We also have some limitations in scan time because each scan will cost us money when using the cloud version, so if your site is too large having many links, you could still reach the limit, you can learn more about these limits in this doc.
https://wpmudev.com/docs/wpmu-dev-plugins/broken-link-checker/#error-15-timeout
Our team is actively working on improving the Broken Link Checker, what you mentioned also makes sense but at this moment it will use the stored data for the next scan.
I will bring your idea to our development attention but I can’t give you an ETA about implementation.
Kind Regards
AminOkay, thanks. At the link you provided, it says, “When scanning a very large site with a huge number of links, the BLC scan will time out if it hits its 3-hour scan limit.” So that’s our situation.
But what it doesn’t say is if the info it collects during that 3 hour limit is used to avoid scanning the same files again next time. If so, then after a few of these timeouts, it should be able to cover the whole site, and would work without timeouts thereafter.
If that is how it works, I wouldn’t mind leaving it on and letting it timeout a few times (I have done that in the past, but maybe I didn’t let it go enough times). If, however, it starts from scratch every time, then it will never get anywhere. Do you know which of the above is true? If the former, then that would be something to suggest to development as well.
I would think it would be possible to use the info collected in each 3 hour scan to avoid duplicating the same scan over and over again. It would also be useful to show the user how many files were scanned and how many there are to go, along with the “3 hour time limit reached” notice, so we know where we stand.
Hi @jamminjames
The way scan works is that each time a new scan is run, it scans the site fully once more. As mentioned, our developers are aware of this, and we work on improvements in that matter already, however, we don’t have an ETA for when improvements in that matter will be implemented.
Kind Regards,
KrisOk, thanks.
Hi @jamminjames,
Since we have acknowledged this will be improved in a future update, I’ll mark it as resolved for now.
For any new feature updates, you can get notifications on our progress by subscribing to our roadmap at https://wpmudev.com/roadmap/
Once new versions are released, any pertinent changes will be described in the changelog, which you can find at:
https://ww.wp.xz.cn/plugins/broken-link-checker/#developers
Kind Regards,
Nithin
You must be logged in to reply to this topic.