$bad_accept_header – reasoning for requiring ‘text/html’?
-
I have a lot of bots crawling my site and I’ve noticed they are not getting the cached versions of pages. After digging in, I see this is caused by the “private static function is_excluded()” returning true for when matching the $bad_accept_header variable.
That variable is set as: $bad_accept_header = ( isset( $_SERVER[‘HTTP_ACCEPT’] ) && false === strpos( $_SERVER[‘HTTP_ACCEPT’], ‘text/html’ ) );
So, when bots (or a command line tool like curl) request the site, the $_SERVER[‘HTTP_ACCEPT’] is */* and this variable is set to true and caching is not enabled.
But when a browser requests a page it sends a more complete accept header such as text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 which then passes this check.
I do see there is a filter ‘cache_enabler_bypass_cache’ which can be used to enable the cache anyways, but I am curious about the reasoning for disabling cache for when $_SERVER[‘HTTP_ACCEPT’] does not contain ‘text/html’ specifically.
Perhaps this could be turned into setting or a constant or filter could be created to just tweak this without having to re-create the rest of the logic in the is_excluded function ( as is currently the case if cache_enabler_bypass_cache is used )
You must be logged in to reply to this topic.