Enhanced Page cache creates recursive

Resolved Lyk
(@lyk-1)

1 year, 6 months ago
Hello,

We have noticed a strange case where W3 Total Cache creates huge directories with a recursive nature. If they grow too much, they need to be deleted through ftp because they crash the clear cache functionality.

The offending files are under /wp-content/cache/page_enhanced/domain/search.
1. The site has no /search/ url. It is a 404. The search functionality is the default, using ?s=. But under /search/bla it would search for bla. I guess this is default behavior?
2. Inside the /search/ directory, there are many directories like:
```
%2525252525252525252525252525257bsearch_term_string%2525252525252525252525252525257d/
%25252525252525252525252525257bsearch_term_string%25252525252525252525252525257d/
%252525252525252525252525257bsearch_term_string%252525252525252525252525257d/
%2525252525252525252525257bsearch_term_string%2525252525252525252525257d/
%25252525252525252525257bsearch_term_string%25252525252525252525257d/
%252525252525252525257bsearch_term_string%252525252525252525257d/
%2525252525252525257bsearch_term_string%2525252525252525257d/
%25252525252525257bsearch_term_string%25252525252525257d/
%252525252525257bsearch_term_string%252525252525257d/
%2525252525257bsearch_term_string%2525252525257d/
%25252525257bsearch_term_string%25252525257d/
%252525257bsearch_term_string%252525257d/
%2525257bsearch_term_string%2525257d/
%25257bsearch_term_string%25257d/
%257bsearch_term_string%257d/
%7bsearch_term_string%7d/
label/--
```
This motive keeps growing. Inside each directory there are more like:
```
/wp-content/cache/page_enhanced/www.__domain__/search/%7bsearch_term_string%7d/__domain__/__domain__/privacy-policy/__domain__/__domain__/__domain__/__domain__/__domain__/__domain__/privacy-policy/__domain__/privacy-policy/__domain__/privacy-policy
```
in some of those (even the ones in the middle) there are 2 folder inside, 1 with the domain name/site and 1 privacy-policy. After randomly “walking” through these, at some point there are the .html and .html_gzip files. That is a page with the test results as if the search query was:
```
{search_term_string}/__domain__/__domain__/privacy-policy/__domain__/__domain__/__domain__/__domain__/__domain__/__domain__/privacy-policy/__domain__/privacy-policy/__domain__/privacy-policy/__domain__/privacy-policy/__domain__/__domain__/__domain__/privacy-policy
```
I removed the domain but the {search_term_string} is literally like that. It is as if there are many searched under https://__domain__/search/{search_term_string}/__domain__/__domain__/privacy-policy/....

Any clues what is happening? Why are all these cached?

I guess I could disable page cache for anything under /search/ but this does not really explain what is happening. I could also disable WordPress search since it is not needed for that site, but I would prefer not to.

Thank you!

PS: the option Cache URIs with query string variables is unchecked (and cannot actually be enabled)
- This topic was modified 1 year, 6 months ago by Lyk.
- This topic was modified 1 year, 6 months ago by Lyk.
- This topic was modified 1 year, 6 months ago by Lyk.
- This topic was modified 1 year, 6 months ago by Lyk.

Viewing 8 replies - 1 through 8 (of 8 total)

Plugin Contributor Marko Vasiljevic
(@vmarko)

1 year, 6 months ago

Hello @lyk-1

Thank you for reaching out and I am happy to help!

Can you please check if there are any strings added to Performance>Page Cache>Advanced>Accepted query strings: field?
If so can you please manully delete the folder in question and remove the strings, save all settings and purge the cache and let me know if the problem persists.

Thanks!
Thread Starter Lyk
(@lyk-1)

1 year, 6 months ago
Hello @vmarko ,

the Accepted query strings are the following, which I believe are the default ones after the setup:
```
_branch_match_id
_bta_c
_bta_tid
_ga
_gl
_ke
adgroupid
adid
age-verified
ao_noptimize
campaignid
campid
cn-reloaded
customid
dm_i
ef_id
epik
fb_action_ids
fb_action_types
fb_source
fbclid
gclid
gclsrc
gdffi
gdfms
gdftrk
hsa_acc
hsa_ad
hsa_cam
hsa_grp
hsa_kw
hsa_mt
hsa_net
hsa_src
hsa_tgt
hsa_ver
igshid
matomo_campaign
matomo_cid
matomo_content
matomo_group
matomo_keyword
matomo_medium
matomo_placement
matomo_source
mc_cid
mc_eid
mkcid
mkevt
mkrid
mkwid
msclkid
mtm_campaign
mtm_cid
mtm_content
mtm_group
mtm_keyword
mtm_medium
mtm_placement
mtm_source
pcrid
piwik_campaign
piwik_keyword
piwik_kwd
pk_campaign
pk_cid
pk_content
pk_keyword
pk_kwd
pk_medium
pk_source
pp
redirect_log_mongo_id
redirect_mongo_id
ref
s_kwcid
sb_referer_host
si
sscid
toolid
trk_contact
trk_module
trk_msg
trk_sid
usqp
utm_campaign
utm_content
utm_expid
utm_id
utm_medium
utm_source
utm_term
```
manully delete the folder in question and remove the strings, save all settings and purge the cache and let me know if the problem persists.

I have deleted the folder a couple of times during the last weeks and the problem persists. You mean checking it again but without any strings in the setting? I just did this and will report back.
But most of them should be cached, eg. utms, etc.

I also noticed a search term in one of the cached pages and it looked like bots checking for vulnerabilities. But the main questions is whether /search/ pages are cached by defaults and why the weird recursive structure.

My current theory is the following:
Bots are spamming/checking the search for vulnerabilities. Instead of checking with GET /?s=test/test/test, they do it via /search/test/test/test. Both urls search for test/test/test. But the latter confuses the page cache mechanism because it is the same as any path. So each bruteforce query is saved as a differently cached page.

Any thoughts?
- This reply was modified 1 year, 6 months ago by Lyk.
Thread Starter Lyk
(@lyk-1)

1 year, 6 months ago

Hello again,

even without any string under Accepted query strings, the issue keeps happening.

I have now tried excluding /search* via Never cache the following pages and will report back
Thread Starter Lyk
(@lyk-1)

1 year, 6 months ago
Excluding the /search* seems to fix the issue permanently.

So the from the plugin’s side, it seems that the issue is the fact that searches like:
/search/test/test/test (which is the same as /?s=test/test/test in WordPress) confuse the page cache mechanism because it appears the same as any path.

Any future fixes for that?

Thank you!
- This reply was modified 1 year, 6 months ago by Lyk. Reason: added a tag
Plugin Contributor Marko Vasiljevic
(@vmarko)

1 year, 6 months ago

Hello @lyk-1

Thank you for your feedback.
Ah I see, so the W3 Total Cache does not know if the page is 404
Pages are cached, not assumed as 404.
That means you have the option “cache 404” enabled in Performance>Page Cache, and if that is not the case, the website works such a way that is_404() wp call returns false for 404 pages which is not correct. As a result, w3tc doesn’t know its 404 and caches it.
I hope this helps!

Thanks!

Thread Starter Lyk
(@lyk-1)

1 year, 6 months ago

Hello @vmarko,

No, it is not about 404. A search results page with no result is not a 404, it is a normal page with 200 status.

It is about searching via the endpoint example.com/search/test/test/test
In WordPress this searches for test/test/test.

This case is not handled correctly by W3 Total Cache.

Plugin Contributor Marko Vasiljevic
(@vmarko)

1 year, 6 months ago

Hello @lyk-1

Thank you for your feedback.
Maybe I misunderstood the information you provided.
The site has no /search/ url. It is a 404.
Very interesting thing indeed because we have not had this kind of issue reported before.
I’ll make sure to pass this information to the dev team so we can check this more.

For the time being, please exclude this path from the cache.

Thanks!

Thread Starter Lyk
(@lyk-1)

1 year, 6 months ago

Hello again,

Indeed the domain/search/ does not exist and it is a 404. But the domain/search/_term_ is being treated by WordPress as a search query and returns a page with the results for _term_.

Yes, I will keep it excluded for now. Please let us know when we have any relevant updates from the dev team.

Thank you!

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘Enhanced Page cache creates recursive’ is closed to new replies.

Tags