$bad_accept_header – reasoning for requiring ‘text/html’? | ww.wp.xz.cn

willshouse
(@willshouse)

4 months, 2 weeks ago

I have a lot of bots crawling my site and I’ve noticed they are not getting the cached versions of pages. After digging in, I see this is caused by the “private static function is_excluded()” returning true for when matching the $bad_accept_header variable.

That variable is set as: $bad_accept_header = ( isset( $_SERVER[‘HTTP_ACCEPT’] ) && false === strpos( $_SERVER[‘HTTP_ACCEPT’], ‘text/html’ ) );

So, when bots (or a command line tool like curl) request the site, the $_SERVER[‘HTTP_ACCEPT’] is */* and this variable is set to true and caching is not enabled.

But when a browser requests a page it sends a more complete accept header such as text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 which then passes this check.

I do see there is a filter ‘cache_enabler_bypass_cache’ which can be used to enable the cache anyways, but I am curious about the reasoning for disabling cache for when $_SERVER[‘HTTP_ACCEPT’] does not contain ‘text/html’ specifically.

Perhaps this could be turned into setting or a constant or filter could be created to just tweak this without having to re-create the rest of the logic in the is_excluded function ( as is currently the case if cache_enabler_bypass_cache is used )

You must be logged in to reply to this topic.

Tags

0 replies
1 participant
Last reply from: willshouse
Last activity: 4 months, 2 weeks ago
Status: not resolved