crawler

Resolved mcdeth
(@mcdeth)

9 months, 3 weeks ago
I’m trying to understand how the crawler works. I’m using the highest-tier hosting plan on Hostinger, with guest mode and other options enabled. Server load is quite low, but the new updates don’t really allow any changes in crawler settings to run any faster.

Here’s the pic:
https://imgur.com/a/Gzu872l

There are fewer than 2,000 URLs, and the crawler hasn’t managed to warm them up in 4 days? Is it normal?

When I visit a category page, I see “Page cached by…” with the current time at the bottom. This shows even though the crawler has supposedly been running for days. I close the browser and open the page again in a fresh private window, and the timestamp updates again — so it seems neither the crawler nor my previous visit warmed the cache?

Is there any way to speed up the crawler if the hosting is limiting it? For example, can a system cron job run multiple crawlers at once?

And last question, less related to crawler, “Default Object Lifetime”, I haven’t found much info on it and only 2 topics, but can I set it from 6m to.. 1h? I flush cache on stock change and i purge cache fully once a day. 20k products, I’ll be watching RAM usage.

If it helps anything: Last Report Number: GPZAHKIH

Thanks for great support.
- This topic was modified 9 months, 3 weeks ago by mcdeth.

Viewing 10 replies - 1 through 10 (of 10 total)

Thread Starter mcdeth
(@mcdeth)

9 months, 3 weeks ago
I’d have one more question, does this user agent:
```
Mozilla/5.0 (compatible; crawler)
```
belong to plugin? because I see this bot tries to crawl the website constantly and I’m not sure if I should keep it blocked
Plugin Support qtwrk
(@qtwrk)

9 months, 3 weeks ago

yes, it’s kind of normal when you use multi-language/currency , the first access will always miss , in order to detect the language/currency and set up cookie correctly

generally speaking , the object data is short-term , temporary data, no point to set TTL that high , the really important data is always stored into database itself

no , our plugin will use lscache_runenr as user agent

Thread Starter mcdeth
(@mcdeth)

9 months, 3 weeks ago

yes, it’s kind of normal when you use multi-language/currency , the first access will always miss , in order to detect the language/currency and set up cookie correctly

Can I exclude it somehow? I have one currency per language and different language per directory (/en/ etc). I’ve seen only this thread on the internet.

But as I was typing, I understood what you mean, like literally first visit is a miss, next pages are cache hit and “Page cached by LiteSpeed Cache” shows yesterdays date.

But since I’m replying I’ll ask if there’s anything left to do for first visit?

Plugin Support qtwrk
(@qtwrk)

9 months, 3 weeks ago

unfortunately no , in the past I have spent quite some time and effort to work on that , but as sorry as it is , no viable solution to it 🙁

Thread Starter mcdeth
(@mcdeth)

9 months, 3 weeks ago

Thank you for enlightment. If I wanted to make own crawler, does dj ple puppeter or curl do the job? I’m talking about unlogged guests and simple website, without wpml and special styles or cookies.

or some spider like screaming frog.

Plugin Support qtwrk
(@qtwrk)

9 months, 3 weeks ago

curl is enough , just make sure you send user agent as lscache_runner and lscache_runner Mobile iPhone
Thread Starter mcdeth
(@mcdeth)

9 months, 3 weeks ago
Thank you—I tested this on a single-language site. Here’s the exact command I ran:
```
curl -s -D -   -H "User-Agent: lscache_runner"   https://website/   -o /dev/null | grep -i x-litespeed-cache
```
first request returns
```
x-litespeed-cache-control: public,max-age=604800
x-litespeed-cache: miss
```
After that, no x-litespeed-cache header appeared—so I assumed the cache was hit. However, when I visit the same URL in my browser, I still see “miss” on the first load (and “hit” only on the second).
- This reply was modified 9 months, 3 weeks ago by mcdeth.
Plugin Support qtwrk
(@qtwrk)

9 months, 3 weeks ago

if you are doing chrome , you need put -H "accept: image/webp"

Thread Starter mcdeth
(@mcdeth)

9 months, 3 weeks ago

Thank you for your time and patience, I couldn’t get it work with curl nor with screaming frog crawler so I’ll just stick with default crawler or this one https://www.cachecrawler.com/

Thread Starter mcdeth
(@mcdeth)

9 months, 2 weeks ago

One more question regarding multi language limitations, is it same for every multilanguage plugin (polylang for example)?

Viewing 10 replies - 1 through 10 (of 10 total)

The topic ‘crawler’ is closed to new replies.