• Resolved Lyk

    (@lyk-1)


    Hello,

    We have noticed a strange case where W3 Total Cache creates huge directories with a recursive nature. If they grow too much, they need to be deleted through ftp because they crash the clear cache functionality.

    The offending files are under /wp-content/cache/page_enhanced/domain/search.

    1. The site has no /search/ url. It is a 404. The search functionality is the default, using ?s=. But under /search/bla it would search for bla. I guess this is default behavior?
    2. Inside the /search/ directory, there are many directories like:
    %2525252525252525252525252525257bsearch_term_string%2525252525252525252525252525257d/
    %25252525252525252525252525257bsearch_term_string%25252525252525252525252525257d/
    %252525252525252525252525257bsearch_term_string%252525252525252525252525257d/
    %2525252525252525252525257bsearch_term_string%2525252525252525252525257d/
    %25252525252525252525257bsearch_term_string%25252525252525252525257d/
    %252525252525252525257bsearch_term_string%252525252525252525257d/
    %2525252525252525257bsearch_term_string%2525252525252525257d/
    %25252525252525257bsearch_term_string%25252525252525257d/
    %252525252525257bsearch_term_string%252525252525257d/
    %2525252525257bsearch_term_string%2525252525257d/
    %25252525257bsearch_term_string%25252525257d/
    %252525257bsearch_term_string%252525257d/
    %2525257bsearch_term_string%2525257d/
    %25257bsearch_term_string%25257d/
    %257bsearch_term_string%257d/
    %7bsearch_term_string%7d/
    label/--

    This motive keeps growing. Inside each directory there are more like:

    /wp-content/cache/page_enhanced/www.__domain__/search/%7bsearch_term_string%7d/__domain__/__domain__/privacy-policy/__domain__/__domain__/__domain__/__domain__/__domain__/__domain__/privacy-policy/__domain__/privacy-policy/__domain__/privacy-policy

    in some of those (even the ones in the middle) there are 2 folder inside, 1 with the domain name/site and 1 privacy-policy. After randomly “walking” through these, at some point there are the .html and .html_gzip files. That is a page with the test results as if the search query was:

    {search_term_string}/__domain__/__domain__/privacy-policy/__domain__/__domain__/__domain__/__domain__/__domain__/__domain__/privacy-policy/__domain__/privacy-policy/__domain__/privacy-policy/__domain__/privacy-policy/__domain__/__domain__/__domain__/privacy-policy

    I removed the domain but the {search_term_string} is literally like that. It is as if there are many searched under https://__domain__/search/{search_term_string}/__domain__/__domain__/privacy-policy/....

    Any clues what is happening? Why are all these cached?

    I guess I could disable page cache for anything under /search/ but this does not really explain what is happening. I could also disable WordPress search since it is not needed for that site, but I would prefer not to.

    Thank you!

    PS: the option Cache URIs with query string variables is unchecked (and cannot actually be enabled)

    • This topic was modified 1 year, 6 months ago by Lyk.
    • This topic was modified 1 year, 6 months ago by Lyk.
    • This topic was modified 1 year, 6 months ago by Lyk.
    • This topic was modified 1 year, 6 months ago by Lyk.
Viewing 8 replies - 1 through 8 (of 8 total)
  • Plugin Contributor Marko Vasiljevic

    (@vmarko)

    Hello @lyk-1

    Thank you for reaching out and I am happy to help!

    Can you please check if there are any strings added to Performance>Page Cache>Advanced>Accepted query strings: field?
    If so can you please manully delete the folder in question and remove the strings, save all settings and purge the cache and let me know if the problem persists.

    Thanks!

    Thread Starter Lyk

    (@lyk-1)

    Hello @vmarko ,

    the Accepted query strings are the following, which I believe are the default ones after the setup:

    _branch_match_id
    _bta_c
    _bta_tid
    _ga
    _gl
    _ke
    adgroupid
    adid
    age-verified
    ao_noptimize
    campaignid
    campid
    cn-reloaded
    customid
    dm_i
    ef_id
    epik
    fb_action_ids
    fb_action_types
    fb_source
    fbclid
    gclid
    gclsrc
    gdffi
    gdfms
    gdftrk
    hsa_acc
    hsa_ad
    hsa_cam
    hsa_grp
    hsa_kw
    hsa_mt
    hsa_net
    hsa_src
    hsa_tgt
    hsa_ver
    igshid
    matomo_campaign
    matomo_cid
    matomo_content
    matomo_group
    matomo_keyword
    matomo_medium
    matomo_placement
    matomo_source
    mc_cid
    mc_eid
    mkcid
    mkevt
    mkrid
    mkwid
    msclkid
    mtm_campaign
    mtm_cid
    mtm_content
    mtm_group
    mtm_keyword
    mtm_medium
    mtm_placement
    mtm_source
    pcrid
    piwik_campaign
    piwik_keyword
    piwik_kwd
    pk_campaign
    pk_cid
    pk_content
    pk_keyword
    pk_kwd
    pk_medium
    pk_source
    pp
    redirect_log_mongo_id
    redirect_mongo_id
    ref
    s_kwcid
    sb_referer_host
    si
    sscid
    toolid
    trk_contact
    trk_module
    trk_msg
    trk_sid
    usqp
    utm_campaign
    utm_content
    utm_expid
    utm_id
    utm_medium
    utm_source
    utm_term

    manully delete the folder in question and remove the strings, save all settings and purge the cache and let me know if the problem persists.

    I have deleted the folder a couple of times during the last weeks and the problem persists. You mean checking it again but without any strings in the setting? I just did this and will report back.
    But most of them should be cached, eg. utms, etc.


    I also noticed a search term in one of the cached pages and it looked like bots checking for vulnerabilities. But the main questions is whether /search/ pages are cached by defaults and why the weird recursive structure.

    My current theory is the following:
    Bots are spamming/checking the search for vulnerabilities. Instead of checking with GET /?s=test/test/test, they do it via /search/test/test/test. Both urls search for test/test/test. But the latter confuses the page cache mechanism because it is the same as any path. So each bruteforce query is saved as a differently cached page.

    Any thoughts?

    • This reply was modified 1 year, 6 months ago by Lyk.
    Thread Starter Lyk

    (@lyk-1)

    Hello again,

    even without any string under Accepted query strings, the issue keeps happening.

    I have now tried excluding /search* via Never cache the following pages and will report back

    Thread Starter Lyk

    (@lyk-1)

    Excluding the /search* seems to fix the issue permanently.

    So the from the plugin’s side, it seems that the issue is the fact that searches like:
    /search/test/test/test (which is the same as /?s=test/test/test in WordPress) confuse the page cache mechanism because it appears the same as any path.

    Any future fixes for that?

    Thank you!

    • This reply was modified 1 year, 5 months ago by Lyk. Reason: added a tag
    Plugin Contributor Marko Vasiljevic

    (@vmarko)

    Hello @lyk-1

    Thank you for your feedback.
    Ah I see, so the W3 Total Cache does not know if the page is 404
    Pages are cached, not assumed as 404.
    That means you have the option “cache 404” enabled in Performance>Page Cache, and if that is not the case, the website works such a way that is_404() wp call returns false for 404 pages which is not correct. As a result, w3tc doesn’t know its 404 and caches it.
    I hope this helps!

    Thanks!

    Thread Starter Lyk

    (@lyk-1)

    Hello @vmarko,

    No, it is not about 404. A search results page with no result is not a 404, it is a normal page with 200 status.

    It is about searching via the endpoint example.com/search/test/test/test
    In WordPress this searches for test/test/test.

    This case is not handled correctly by W3 Total Cache.

    Plugin Contributor Marko Vasiljevic

    (@vmarko)

    Hello @lyk-1

    Thank you for your feedback.
    Maybe I misunderstood the information you provided.
    The site has no /search/ url. It is a 404.
    Very interesting thing indeed because we have not had this kind of issue reported before.
    I’ll make sure to pass this information to the dev team so we can check this more.

    For the time being, please exclude this path from the cache.

    Thanks!

    Thread Starter Lyk

    (@lyk-1)

    Hello again,

    Indeed the domain/search/ does not exist and it is a 404. But the domain/search/_term_ is being treated by WordPress as a search query and returns a page with the results for _term_.

    Yes, I will keep it excluded for now. Please let us know when we have any relevant updates from the dev team.

    Thank you!

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘Enhanced Page cache creates recursive’ is closed to new replies.