Page Cache: Preload and overwrite existing files

Resolved vielhuber
(@vielhuber)

3 years, 5 months ago
Hello!

My aim is to preload the cache for all pages of the sitemap.

When this is finished, the cronjob should redo the same thing all 10 minutes, over and over again and overwrite the existing cached files.

This is in my view the best strategy: Always have all pages cached and have them not older than 10 minutes.

However, there does not seem to be a configuration that makes this possible.

The closest I got is (for example if I have 50 pages in total):

– Update interval: 300 seconds
– Pages per interval: 50

This seems to not help. I also have to set:

– Maximum lifetime of cache objects: 300 seconds
– Garbage collection interval: 300 seconds

However then all 5 minutes the complete cache seems to vanish and needs to be rebuilt. I just want that the crawler recaches page by page and updates its content.

Is this somehow possible?
- This topic was modified 3 years, 5 months ago by vielhuber.
- This topic was modified 3 years, 5 months ago by vielhuber.

Viewing 7 replies - 1 through 7 (of 7 total)

Plugin Contributor Marko Vasiljevic
(@vmarko)

3 years, 5 months ago

Hello @vielhuber

Thank you for reaching out and I am happy to assist you with this.
What the Cache Preload feature does is this:
– Check the last offset
– Check how many URLs are allowed to be processed per run
– Fetch all URLs from sitemap (even nested sitemaps work)
– Loop through a set of URLs to visit them
If the end of the list is reached, it will start from the beginning in the next run
Calling the wp-cli command is the same as the cronjob, but if you want you can specify where to start this run and how many pages should be visited (instead of using the configuration values)
If your cron job is running every day instead of every minute, that means only one run can be done per day. If the cron job is set to every minute and the interval is set to a higher value, the plugin will check if the interval specified in the configuration has passed since the last time.

SOme specific things that I should mention, the Cache preload may also trigger upon a cache purge. So if you are experiencing the problem with only _old files in the folder, something must be triggering the cache purge, like some plugin or Theme, and you should make sure to check this in the purge log for page cache in the Debug section of the General settings.

Thanks!
Thread Starter vielhuber
(@vielhuber)

3 years, 5 months ago
Hello!

Thanks for the detailed explanation.

The point is:

– Loop through a set of URLs to visit them

If an URL is not “expired” and already has been cached, the crawler does not update its cache (and leaves the old version in place).

If I set the expiration lower, another job deletes / renames the cached version and this results in a gap (where no cached version is available, until the crawler caches the page again).

I want that the crawler updates the cached version in the first place, so that there are no gaps of uncached versions.

Is this somehow possible?
- This reply was modified 3 years, 5 months ago by vielhuber.
Thread Starter vielhuber
(@vielhuber)

3 years, 5 months ago
After investigating more in the code, this seems not possible.

The two jobs run completely independent – the cleanup job deletes all cached files and later on, the prime job then recreates them.

In order to get fresh files (that are no longer than X minutes), the cleanup job must delete them first.

That means there is always a gap, where some users get uncached pages.

Or is there a possibility with “Browser Cache” and “Page Cache: Enhanced” to set the expiration headers of the html files to e.g. 5 minutes so that the prime cache updates the html files that are not cleaned up beforehand?
- This reply was modified 3 years, 5 months ago by vielhuber.
Plugin Contributor Marko Vasiljevic
(@vmarko)

3 years, 5 months ago

Hello @vielhuber

The Garbage collection interval is used to delete the _old files. _old files are created when the cache is purged.
it’s OK to have _old files. if you always have only _old files – that’s a sign something purges the cache all the time like a theme or some plugin.
If the cache is not purged, the cached page will not expire and the new preload turn will overwrite the existing file.
Thanks!
Thread Starter vielhuber
(@vielhuber)

3 years, 5 months ago
Thanks for your answer!

So let’s forget please about the _old-files.
I also have set “Garbage collection interval:” to a very high number to not get confused with that job at the moment.

I have “Page Cache: Disk enhanced” enabled and the start page is cached (in the /wp-content/cache/page_enhanced/…/ folder):

_index_slash_ssl.html
Creation time: 2022-12-05 23:26:17

Now I run the job w3_pgcache_prime.

(Pages per interval is set to a very high number, so the start page must be included on the next run).

But the already existing html file _index_slash_ssl.html does not change/update!

The reason for that is clear:
In PgCache_plugin_Admin.php there is the prime() function with simply calls the URLs URL by URL:

Line: 162
```
// use 'WordPress' since by default we use W3TC-powered by
// which blocks caching
foreach ( $queue as $url ) {
    Util_Http::get( $url, array( 'user-agent' => 'WordPress' ) );

    if ( !is_null( $log_callback ) ) {
        $log_callback( 'Priming ' . $url );
    }
}
```
Since it finds an already cached version it does not overwrite the cached file.

So the behaviour you are mentionning does not exist, or am I wrong?

When I then run w3_pgcache_cleanup, _index_slash_ssl.html gets renamed to _index_slash_ssl.html_old and a new _index_slash_ssl.html is created after the next prime.

That’s not what I want: I don’t want a gap.
- This reply was modified 3 years, 5 months ago by vielhuber.
Plugin Contributor Marko Vasiljevic
(@vmarko)

3 years, 5 months ago

Hello @vielhuber

Thank you for your patience and sorry for the late reply.
Priming only primes, not regenerate.
That’s how it should be since the priming idea is creating the cache so that once users come – they get a cached page.
So yes, you are correct about this and the “gap” you are referring to cannot be avoided.

I’ll have a chat with the team and this could be a new feature if regeneration is going to be implemented.
However, it may be not so straightforward due to groups (like user agent groups, and cookie groups).

Thanks!
Zoltan Baffy
(@macika)

3 years, 3 months ago
Hi guys!

I got what @vielhuber says. I also try to eliminate that caching gap.
The plugin loops through the exired cached pages, but it’s slow. Why? Because you have a limited number of pages per interval to avoid server crash.

So, there is a trick: at first when the cache is empty use for example 50 pages per interval (or a number you normally use). The server caches all pages. Then, turn it up to 200 pages per interval or even higher. Why? Because you already cached all pages, so then you won’t overload your server, you just let the server quickly loop through all pages and find and regenerate the expired pages sooner.

I just tried it with 2,500 pages per interval, and the update interval is 60 seconds (on shared hosting) and it works great! 🙂

The only drawback is that you can’t flush the cache, because it will cause a server overload when the plugin try to rebuild the empty page cache! Before that, set the “pages per interval” back!

This also leaves gaps in the cache, but lowers the number of expired pages.

Give it a try! 🙂
- This reply was modified 3 years, 3 months ago by Zoltan Baffy.
- This reply was modified 3 years, 3 months ago by Zoltan Baffy.
- This reply was modified 3 years, 3 months ago by Zoltan Baffy.

Viewing 7 replies - 1 through 7 (of 7 total)

The topic ‘Page Cache: Preload and overwrite existing files’ is closed to new replies.