Why does the crawler need to take several cycles to complete crawling the entire site?
This is not a cycle. Depending on your settings (Guest Mode ?, webp ?, Mobile View ?) the crawler must crawl each URL as much crawlers are defined. That means, if you have Guest Mode, webp and Mobile View enabled each URL must be crawled 8 times = 37032 URLs and not only 4629.
Another problem is that I cannot enabled manual crawling. After clicking it, it will say “Start Watching” under “Watch ccrawler status”. But when the page refreshes after clicking manually run, it stops running again.
Press the button twice.
re: Why does the crawler need to take several cycles to complete crawling the entire site?
I do not have webp and guest mode enabled. So there is only one crawl job available. After going through one cycle, it will run again in an hour and repeats itself. So some of the URL’s are missed during the furst cycle. But this repeats itself for almost a day until the site is completely crawled.
Take a look at the crawler documentation. There you will find a description about “Run Duration” setting. This setting can cause what you describe.
https://docs.litespeedtech.com/lscache/lscwp/crawler/
Are you talking about the crawl interval?
I have already set it to 1800 seconds to run every half hour. Up to my understanding, the crawler will not start a cycle every 1800 seconds if it knows the current crawler is still in progress. Timeout is now increased to 100 seconds which is more than enough. I checked my pages using chrome debug, it takes around 10 – 15 seconds to open a page without cache.
So why are there so many cache misses which requires multiple crawler cycle to complete the entire site?
What you say about your crawler settings fuels the suspicion that these are not only wrong, but that your server is also overwhelmed by them. If you say that it would take about 3 hours for all URLs to be crawled, but you set an interval of 1800 seconds (0.5h), then it doesn’t take a calculator to realize that the interval is set way too short.
What is much more dramatic, however, is that it takes up to 15 seconds for an uncached page to load. You don’t have a crawler problem because of this, but the low performance of your server is your fundamental problem. With so many URLs, you should consider upgrading your server, since you cannot make any correct crawler settings with your current hosting, precisely because your server is completely overloaded with the amount of data.
Plugin Support
qtwrk
(@qtwrk)
So why are there so many cache misses which requires multiple crawler cycle to complete the entire site?
you mean like some URLs in the middle that be miss after the crawler ?
did you enable option like UCSS/CCSS/LQIP/VPI ?