Hello @vielhuber
Thank you for reaching out and I am happy to assist you with this.
What the Cache Preload feature does is this:
– Check the last offset
– Check how many URLs are allowed to be processed per run
– Fetch all URLs from sitemap (even nested sitemaps work)
– Loop through a set of URLs to visit them
If the end of the list is reached, it will start from the beginning in the next run
Calling the wp-cli command is the same as the cronjob, but if you want you can specify where to start this run and how many pages should be visited (instead of using the configuration values)
If your cron job is running every day instead of every minute, that means only one run can be done per day. If the cron job is set to every minute and the interval is set to a higher value, the plugin will check if the interval specified in the configuration has passed since the last time.
SOme specific things that I should mention, the Cache preload may also trigger upon a cache purge. So if you are experiencing the problem with only _old files in the folder, something must be triggering the cache purge, like some plugin or Theme, and you should make sure to check this in the purge log for page cache in the Debug section of the General settings.
Thanks!
Hello!
Thanks for the detailed explanation.
The point is:
β Loop through a set of URLs to visit them
If an URL is not “expired” and already has been cached, the crawler does not update its cache (and leaves the old version in place).
If I set the expiration lower, another job deletes / renames the cached version and this results in a gap (where no cached version is available, until the crawler caches the page again).
I want that the crawler updates the cached version in the first place, so that there are no gaps of uncached versions.
Is this somehow possible?
-
This reply was modified 3 years, 5 months ago by
vielhuber.
After investigating more in the code, this seems not possible.
The two jobs run completely independent – the cleanup job deletes all cached files and later on, the prime job then recreates them.
In order to get fresh files (that are no longer than X minutes), the cleanup job must delete them first.
That means there is always a gap, where some users get uncached pages.
Or is there a possibility with “Browser Cache” and “Page Cache: Enhanced” to set the expiration headers of the html files to e.g. 5 minutes so that the prime cache updates the html files that are not cleaned up beforehand?
-
This reply was modified 3 years, 5 months ago by
vielhuber.
Hello @vielhuber
The Garbage collection interval is used to delete the _old files. _old files are created when the cache is purged.
it’s OK to have _old files. if you always have only _old files – that’s a sign something purges the cache all the time like a theme or some plugin.
If the cache is not purged, the cached page will not expire and the new preload turn will overwrite the existing file.
Thanks!
Thanks for your answer!
So let’s forget please about the _old-files.
I also have set “Garbage collection interval:” to a very high number to not get confused with that job at the moment.
I have “Page Cache: Disk enhanced” enabled and the start page is cached (in the /wp-content/cache/page_enhanced/…/ folder):
_index_slash_ssl.html
Creation time: 2022-12-05 23:26:17
Now I run the job w3_pgcache_prime.
(Pages per interval is set to a very high number, so the start page must be included on the next run).
But the already existing html file _index_slash_ssl.html does not change/update!
The reason for that is clear:
In PgCache_plugin_Admin.php there is the prime() function with simply calls the URLs URL by URL:
Line: 162
// use 'WordPress' since by default we use W3TC-powered by
// which blocks caching
foreach ( $queue as $url ) {
Util_Http::get( $url, array( 'user-agent' => 'WordPress' ) );
if ( !is_null( $log_callback ) ) {
$log_callback( 'Priming ' . $url );
}
}
Since it finds an already cached version it does not overwrite the cached file.
So the behaviour you are mentionning does not exist, or am I wrong?
When I then run w3_pgcache_cleanup, _index_slash_ssl.html gets renamed to _index_slash_ssl.html_old and a new _index_slash_ssl.html is created after the next prime.
That’s not what I want: I don’t want a gap.
-
This reply was modified 3 years, 5 months ago by
vielhuber.
Hello @vielhuber
Thank you for your patience and sorry for the late reply.
Priming only primes, not regenerate.
That’s how it should be since the priming idea is creating the cache so that once users come – they get a cached page.
So yes, you are correct about this and the “gap” you are referring to cannot be avoided.
I’ll have a chat with the team and this could be a new feature if regeneration is going to be implemented.
However, it may be not so straightforward due to groups (like user agent groups, and cookie groups).
Thanks!
Hi guys!
I got what @vielhuber says. I also try to eliminate that caching gap.
The plugin loops through the exired cached pages, but it’s slow. Why? Because you have a limited number of pages per interval to avoid server crash.
So, there is a trick: at first when the cache is empty use for example 50 pages per interval (or a number you normally use). The server caches all pages. Then, turn it up to 200 pages per interval or even higher. Why? Because you already cached all pages, so then you won’t overload your server, you just let the server quickly loop through all pages and find and regenerate the expired pages sooner.
I just tried it with 2,500 pages per interval, and the update interval is 60 seconds (on shared hosting) and it works great! π
The only drawback is that you can’t flush the cache, because it will cause a server overload when the plugin try to rebuild the empty page cache! Before that, set the “pages per interval” back!
This also leaves gaps in the cache, but lowers the number of expired pages.
Give it a try! π
-
This reply was modified 3 years, 3 months ago by
Zoltan Baffy.
-
This reply was modified 3 years, 3 months ago by
Zoltan Baffy.
-
This reply was modified 3 years, 3 months ago by
Zoltan Baffy.